Thread: Find inconsistencies in data with date range
I know I can do this Java, but I'd rather have this running as a Stored Procedure. What I am wanting to do is identifyand potentially correct the summary data for date inconsistencies. We have policies/red flag reports in place tokeep this from happening, but we are now cleaning up history. The query below works on a per store basis, but I'd liketo be able to run this for all stores in the location table.<br /><br /> I've looked at some procedure codes regardinglooping, but everything I try to create seems to give me problems. THe code I'm trying is also below. Does anyonehave any suggestions on how to accomplish this?<br /><br /><br /><br /><u>Working Tables</u><br /> locations - tablecontains store information, startup date, address, etc<br /> daily_salessummary - table holds daily sales summaryby store (summary should be updated nightly). eod_ts is End of Day Timestamp.<br /><br /><u>Query</u><br /> WITHdatelist AS(<br /> SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek<br /> FROM (SELECT date asGenDate<br /> FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date<br /> ) AS t1 <br /> )<br /> SELECT gendate FROM datelist AS t1<br /> WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummaryAS t1<br /> JOIN locations AS t2 ON t1.location_id = t2.row_id<br /> WHERE t2.locationCode = 'US_FL_TAMPA_141')<br /> <br /> AND gendate > (SELECTstart_date FROM locations WHERE locationCode = 'US_FL_TAMPA_141')<br /><br /><br /><u>Desired Output</u> - could outputto an exceptions table<br /> StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01<br /> StoreCode 'MX_OAXACA_SALINA_8344'missing daily summary for 2011-06-05<br /><br /><br /><u>ProcedureSQL</u> (contains unknown errors)<br/> DECLARE<br /> CURSOR location_table IS<br /> SELECT locationCode FROM locations;<br /> BEGIN<br /> FOR thisSymbol IN ticker_tables LOOP<br /> EXECUTE IMMEDIATE 'WITH datelist AS(<br /> SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek<br /> FROM (SELECT date as GenDate<br /> FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval)date<br /> ) AS t1 <br /> )<br /> SELECT gendate FROM datelistAS t1<br /> WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummaryAS t1<br /> JOIN locations AS t2 ONt1.location_id = t2.row_id<br /> WHERE t2.locationCode= '' || location_table.locationCode || '') <br /> AND gendate > (SELECT start_date FROM locations WHERE locationCode = '' ||location_table.locationCode || '')';<br /> END LOOP;<br /> END;<br /><br /><br /><br /><pre class="moz-signature"cols="72">-- Jason Aleski / IT Specialist</pre>
On 03/06/2015 01:38 PM, Jason Aleski wrote: > I know I can do this Java, but I'd rather have this running as a Stored > Procedure. What I am wanting to do is identify and potentially correct > the summary data for date inconsistencies. We have policies/red flag > reports in place to keep this from happening, but we are now cleaning up > history. The query below works on a per store basis, but I'd like to be > able to run this for all stores in the location table. > > I've looked at some procedure codes regarding looping, but everything I > try to create seems to give me problems. THe code I'm trying is also > below. Does anyone have any suggestions on how to accomplish this? > > > > _Working Tables_ > locations - table contains store information, startup date, address, etc > daily_salessummary - table holds daily sales summary by store > (summary should be updated nightly). eod_ts is End of Day Timestamp. > > _Query_ > WITH datelist AS( > SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek > FROM (SELECT date as GenDate > FROM > generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 > day'::interval) date > ) AS t1 > ) > SELECT gendate FROM datelist AS t1 > WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1 > JOIN locations AS t2 ON t1.location_id = t2.row_id > WHERE t2.locationCode = 'US_FL_TAMPA_141') > > AND gendate > (SELECT start_date FROM locations WHERE locationCode = > 'US_FL_TAMPA_141') First in above and in variation below I would probably do some alias renaming. I pretty sure t1 means different things throughout the query, but is hard to follow exactly what. > > > _Desired Output_ - could output to an exceptions table > StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01 > StoreCode 'MX_OAXACA_SALINA_8344' missing daily summary for 2011-06-05 > > > _ProcedureSQL_ (contains unknown errors) > DECLARE > CURSOR location_table IS > SELECT locationCode FROM locations; > BEGIN > FOR thisSymbol IN ticker_tables LOOP > EXECUTE IMMEDIATE 'WITH datelist AS( > SELECT t1.GenDate as > gendate, extract(dow from GenDate) AS dayofweek > FROM (SELECT date as > GenDate > FROM > generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 > day'::interval) date > ) AS t1 > ) > SELECT gendate FROM > datelist AS t1 > WHERE gendate NOT IN > (SELECT t1.eod_ts FROM daily_salessummary AS t1 > JOIN locations AS t2 ON t1.location_id = t2.row_id > WHERE t2.locationCode = '' || location_table.locationCode || '') > AND gendate > (SELECT > start_date FROM locations WHERE locationCode = '' || > location_table.locationCode || '')'; > END LOOP; > END; I do not use cursors enough in plpgsql to be sure, but I think the above definition is incorrect: http://www.postgresql.org/docs/9.4/interactive/plpgsql-cursors.html To reduce the moving parts I would write the function without the cursor and just hardwire the location information to start with to get a working sample. > > > > -- > Jason Aleski / IT Specialist > -- Adrian Klaver adrian.klaver@aklaver.com
Jason Aleksi wrote > I've looked at some procedure codes regarding looping, but everything I > try to create seems to give me problems. THe code I'm trying is also > below. Does anyone have any suggestions on how to accomplish this? I would build a master table of stores and dates and then write a query to update a third field from null to the number of records found for the given combination. When all the nulls are gone you can scan for zeros to figure out what combinations are missing data. If you have a matching index the queries should execute reasonably efficiently and you either call it from a function in the database or externally on one or more threads depending on where you expect to encounter the processing bottleneck. You can process more than one day or store at a time if so desired but there will likely be a point of diminishing returns depending on the volume of data. I would probably do all days for one store in a given year at a time. David J. -- View this message in context: http://postgresql.nabble.com/Find-inconsistencies-in-data-with-date-range-tp5840865p5840891.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Hi,
Something like this?
It inserts error records into a table called locationrep.
create or replace function finderror() returns void as $$
declare
startd date;
daterec record;
begin
select into startd min(startdate) from location; --identify the earliest opening date
--iterating trough dates from that date until now
for daterec in select generate_series::date as repdate from generate_series(startd,now()::date,'1 day'::interval) loop
--insert ito the error table all the shops which not have an entry from the current date and opened before the said date
insert into locationrep select shop,daterec.repdate from location where startdate<=daterec.repdate and not exists(select 1 from daily_salessummary where shop=location.shop and reportdate=daterec.repdate);
end loop;
end;
$$ language plpgsql;
Regards,
Sándor Daku
On 6 March 2015 at 22:38, Jason Aleski <jason.aleski@gmail.com> wrote:
I know I can do this Java, but I'd rather have this running as a Stored Procedure. What I am wanting to do is identify and potentially correct the summary data for date inconsistencies. We have policies/red flag reports in place to keep this from happening, but we are now cleaning up history. The query below works on a per store basis, but I'd like to be able to run this for all stores in the location table.
I've looked at some procedure codes regarding looping, but everything I try to create seems to give me problems. THe code I'm trying is also below. Does anyone have any suggestions on how to accomplish this?
Working Tables
locations - table contains store information, startup date, address, etc
daily_salessummary - table holds daily sales summary by store (summary should be updated nightly). eod_ts is End of Day Timestamp.
Query
WITH datelist AS(
SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN locations AS t2 ON t1.location_id = t2.row_id
WHERE t2.locationCode = 'US_FL_TAMPA_141')
AND gendate > (SELECT start_date FROM locations WHERE locationCode = 'US_FL_TAMPA_141')
Desired Output - could output to an exceptions table
StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01
StoreCode 'MX_OAXACA_SALINA_8344' missing daily summary for 2011-06-05
ProcedureSQL (contains unknown errors)
DECLARE
CURSOR location_table IS
SELECT locationCode FROM locations;
BEGIN
FOR thisSymbol IN ticker_tables LOOP
EXECUTE IMMEDIATE 'WITH datelist AS(
SELECT t1.GenDate as gendate, extract(dow from GenDate) AS dayofweek
FROM (SELECT date as GenDate
FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1 day'::interval) date
) AS t1
)
SELECT gendate FROM datelist AS t1
WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1
JOIN locations AS t2 ON t1.location_id = t2.row_id
WHERE t2.locationCode = '' || location_table.locationCode || '')
AND gendate > (SELECT start_date FROM locations WHERE locationCode = '' || location_table.locationCode || '')';
END LOOP;
END;-- Jason Aleski / IT Specialist
In case anyone else needs similar code, I was able to get this working. Below is the code that pulls the missing datesusing a cursor and returns the information into a table. I'm sure there may be a way to make the code more efficient,but considering this will only get ran maybe once a quarter (for quarterly reports), it works for me. With 700+stores, it takes about 30 minutes to fully run from a reporting server. I have a JAVA program that queries the function"SELECT * FROM eod_missing_dates();" Then sends all the missing dates to a RabbitMQ server to with a worker programto try to rebuild the missing eod summaries and if not, it will send a message to the store managers. Hopefully thiscode will help someone else!<br /><br /><br /> CREATE OR REPLACE FUNCTION eod_missing_dates()<br /> RETURNS TABLE(store_iduuid, location character varying, missing_ts timestamp with time zone) AS<br /> $BODY$<br /> DECLARE<br /> location_cursor CURSOR FOR SELECT * FROM locations ORDER BY store_id;<br /> store_rec location%ROWTYPE;<br /> BEGIN<br/><br /> CREATE TEMP TABLE dr_temptable(dr_ts, dr_dow) AS (SELECT t1.GenDate as gendate, extract(dow from GenDate)AS dayofweek<br /> FROM (SELECT date as GenDate<br /> FROM generate_series('1950-01-01'::date,CURRENT_TIMESTAMP::date,'1day'::interval) date<br /> ) AS t1 <br /> WHERE extract(dowfrom GenDate) NOT IN (0,6));<br /><br /> OPEN location_cursor;<br /> LOOP<br /> FETCH location_cursorINTO store_rec;<br /> EXIT WHEN store_rec IS NULL;<br /><br /> IF NOT FOUND THEN<br /> EXIT;<br/> END IF;<br /> <br /> RAISE INFO '%', 'Checking data/dates for ' || store_rec.location;<br /> RETURNQUERY SELECT store_rec.row_id as store_id, store_rec.location AS location, dr_ts AS missing_ts FROM dr_temptable ASt1<br /> WHERE dr_ts NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1<br /> JOIN location AS t2 ON t1.store_id = t2.row_id<br /> WHERE t2.location = store_rec.location) <br /> AND dr_ts > (SELECT start_date FROM locations WHERElocation=store_rec.location); <br /> END LOOP;<br /> CLOSE location_cursor; <br /> DROP TABLE dr_temptable;<br/> END;<br /> $BODY$<br /> LANGUAGE plpgsql VOLATILE<br /> ;<br /><br /><br /><pre class="moz-signature"cols="72">Jason Aleski / IT Specialist</pre><br /><blockquote cite="mid:CAKyoTgbVfWULPdG1-BX3FSfm=hT3aFRs=avM+kg=0ozhP_RCwg@mail.gmail.com"type="cite"><div class="gmail_extra"><br /><divclass="gmail_quote">On 6 March 2015 at 22:38, Jason Aleski <span dir="ltr"><<a href="mailto:jason.aleski@gmail.com"moz-do-not-send="true" target="_blank">jason.aleski@gmail.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><divbgcolor="#FFFFFF" text="#000000"> I know I can do this Java, but I'd rather have this runningas a Stored Procedure. What I am wanting to do is identify and potentially correct the summary data for date inconsistencies. We have policies/red flag reports in place to keep this from happening, but we are now cleaning up history. The query below works on a per store basis, but I'd like to be able to run this for all stores in the location table.<br/><br /> I've looked at some procedure codes regarding looping, but everything I try to create seems to give meproblems. THe code I'm trying is also below. Does anyone have any suggestions on how to accomplish this?<br /><br /><br/><br /><u>Working Tables</u><br /> locations - table contains store information, startup date, address, etc<br /> daily_salessummary - table holds daily sales summary by store (summary should be updated nightly). eod_ts is End ofDay Timestamp.<br /><br /><u>Query</u><br /> WITH datelist AS(<br /> SELECT t1.GenDate as gendate, extract(dow from GenDate)AS dayofweek<br /> FROM (SELECT date as GenDate<br /> FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1day'::interval) date<br /> ) AS t1 <br /> )<br /> SELECTgendate FROM datelist AS t1<br /> WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1<br /> JOIN locations AS t2 ON t1.location_id = t2.row_id<br /> WHERE t2.locationCode= 'US_FL_TAMPA_141')<br /> <br /> AND gendate > (SELECT start_date FROM locationsWHERE locationCode = 'US_FL_TAMPA_141')<br /><br /><br /><u>Desired Output</u> - could output to an exceptions table<br/> StoreCode 'US_FL_TAMA_141' missing daily summary for 1998-01-01<br /> StoreCode 'MX_OAXACA_SALINA_8344' missingdaily summary for 2011-06-05<br /><br /><br /><u>ProcedureSQL</u> (contains unknown errors)<br /> DECLARE<br /> CURSOR location_table IS<br /> SELECT locationCode FROM locations;<br /> BEGIN<br /> FOR thisSymbol IN ticker_tablesLOOP<br /> EXECUTE IMMEDIATE 'WITH datelist AS(<br /> SELECTt1.GenDate as gendate, extract(dow from GenDate) AS dayofweek<br /> FROM (SELECT date as GenDate<br /> FROM generate_series('1985-01-01'::date,CURRENT_TIMESTAMP::date,'1day'::interval) date<br /> ) AS t1 <br /> )<br /> SELECT gendate FROM datelist AS t1<br /> WHERE gendate NOT IN (SELECT t1.eod_ts FROM daily_salessummary AS t1<br /> JOIN locations AS t2 ON t1.location_id = t2.row_id<br/> WHERE t2.locationCode = '' || location_table.locationCode || '') <br /> AND gendate >(SELECT start_date FROM locations WHERE locationCode = '' || location_table.locationCode || '')';<br /> END LOOP;<br/> END;<span class="HOEnZb"><font color="#888888"><br /><br /><br /><br /><pre cols="72">-- Jason Aleski / IT Specialist</pre> </font></span></div></blockquote></div><br /></div></blockquote><br />