Re: Populating missing dates in postgresql data - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Populating missing dates in postgresql data
Date
Msg-id D5409921-71A5-40EC-B63D-EA50DD2DC329@gmail.com
Whole thread Raw
In response to Populating missing dates in postgresql data  ("Lavrenz, Steven M" <slavrenz@purdue.edu>)
List pgsql-general
> On 26 Mar 2015, at 1:25, Lavrenz, Steven M <slavrenz@purdue.edu> wrote:
>
> Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a
databaseof hardware objects, each of which has several different “channels”. Once per day, these channels are supposed 
>
>
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
> 286                         2                              2014-12-01
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-01
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
>
>
> And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. object 286
losescommunications on 12/1/2014. Then the table might look like:    
>
>
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
>
>
> 286                         2                              2014-12-02
>
>
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
>
>
> Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:
>
>
>
> object_id             channel                check-in date
>
> ****************************************
>
> 990                         1                              2014-12-01
>
> 990                         1                              2014-12-02
>
> 990                         2                              2014-12-01
>
> 990                         2                              2014-12-02
>
>
>
> 286                         2                              2014-12-02
>
> 286                         5                              2014-12-01
>
> 286                         5                              2014-12-02
>
> 4507                       1                              2014-12-01
>
> 4507                       1                              2014-12-02
>
> 4507                       2                              2014-12-01
>
> 4507                       2                              2014-12-02
>
>
>
> I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each
day.For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So,
for
>
>
>
> object_id             channel                check-in date                     comm failure
>
> **********************************************************
>
> 990                         1                              2014-12-01                          No
>
> 990                         1                              2014-12-02                          No
>
> 990                         2                              2014-12-01                          No
>
> 990                         2                              2014-12-02                          No
>
> 286                         2                              2014-12-01                          Yes
>
> 286                         2                              2014-12-02                          No
>
> 286                         5                              2014-12-01                          Yes
>
> 286                         5                              2014-12-02                          No
>
> 4507                       1                              2014-12-01                          No
>
> 4507                       1                              2014-12-02                          No
>
> 4507                       2                              2014-12-01                          No
>
> 4507                       2                              2014-12-02                          No
>
>
>
>
>
> I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join
ofTABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single 
>
>
>
> object_id             channel                check-in date                     comm failure
>
> **********************************************************
>
> 990                         1                              2014-12-01                          No
>
> 990                         1                              2014-12-02                          No
>
> 990                         2                              2014-12-01                          No
>
> 990                         2                              2014-12-02                          No
>
> 286                         2                                                                              Yes
>
> 286                         2                              2014-12-02                          No
>
> 286                         5                                                                              Yes
>
> 286                         5                              2014-12-02                          No
>
> 4507                       1                              2014-12-01                          No
>
> 4507                       1                              2014-12-02                          No
>
> 4507                       2                              2014-12-01                          No
>
> 4507                       2                              2014-12-02                          No
>
>
>
> I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!


Easiest would be to insert the missing values in your table, something like:

WITH RECURSIVE calendar (missing_date) AS (
    SELECT MAX(check_in_date) FROM table_a WHERE comm_failure = 'YES'
    UNION ALL
    SELECT missing_date + interval '1 day' FROM calendar
    WHERE missing_date < CURRENT_DATE
)
INSERT INTO table_a (object_id, channel, check_in_date, comm_failure)
SELECT b.object_id, b.channel. c.missing_date, 'YES'
FROM table_b b, calendar c
WHERE NOT EXISTS (
    SELECT 1
    FROM table_a a
    WHERE a.object_id = b.object_id
    AND a.channel = b.channel
    AND a.check_in_date = c.missing_date
);

That's off the top of my head, untested, etc, but I think I got that mostly right.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Autovacuum query
Next
From: Craig Ringer
Date:
Subject: Re: BDR - triggers on receiving node?