Re: Better way to compute moving averages? - Mailing list pgsql-sql

From Jason Aleski
Subject Re: Better way to compute moving averages?
Date
Msg-id 54C30541.7050908@gmail.com
Whole thread Raw
In response to Re: Better way to compute moving averages?  (David G Johnston <david.g.johnston@gmail.com>)
List pgsql-sql
On 1/23/2015 6:50 PM, David G Johnston wrote:
> Jason Aleksi wrote
>> I've been asked compute various moving averages of end of day sales by
>> store.  I can do this for all rows with no problem (same query without
>> the WHERE clause towards the end).  That query took 10-15 minutes to run
>> over approx 3.4 million rows.  I'm sure they will want this information
>> to be added to the daily end of day reports. I can run the query below
>> (excluding the WHERE clause) but it takes almost as long to run one day
>> as it does the entire dataset.  It looks like when I do the inner
>> select, it is still running over the entire dataset.  I have added a
>> "WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '365 days'" (as below) to
>> the inner query, which allows the query to run between 1-2 minutes.
>>
>> Question 1)  This seems to work, but was curious if there is a better way.
> Possibly though I'm not sure you've given enough info...
>
> Regardless, are you positive you need "FOLLOWING" instead of "PRECEDING"
> frames?
>
>
>> Question 2)  Is there a way to specify a date, instead of using current
>> date and current_timestamp, as a variable and use that in the query?  I
>> know I can do that in my Java program
>> using variables, but wasn't sure if there was a way to do this with a
>> function or stored procedure?
> Have you read about functions in the documentation?
>
> CREATE FUNCTION echo_text(input_var text)
> RETURNS text
> AS $$
> SELECT input_var;
> $$
> LANGUAGE sql
> STRICT
> ;
>
>
>> It looks like when I do the inner
>> select, it is still running over the entire dataset.
>> I have added a
>> "WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '365 days'" (as below) to
>> the inner query, which allows the query to run between 1-2 minutes.
>> [...]
>>       FROM end_of_day_data
>>       WHERE eod_ts > CURRENT_TIMESTAMP - INTERVAL '260 days'
>>       GROUP BY store_id, eod_ts, eod_sales
>>       ORDER BY ticker_id, eod_ts
>>     ) as t1
>>     WHERE t1.eod_ts = current_date
>> );
> You expected differently?  GROUP BY (in particular, not only) prevents any
> kind of push-down of quals.
>
> Try something like (not tested or tried - just having an idea this moment):
>
> WITH scope_of_source_with_aging AS (
> SELECT eod_ts,
> first(eod_ts) OVER () AS ref_eod,
> eod_ts - first(eod_ts) OVER (ORDER BY eod_ts) AS eod_age
> FROM end_of_day_data WHERE eod_ts >= current_timestamp - interval '206 days'
> )
> SELECT ref_eod,
> avg(CASE WHEN eod_age BETWEEN 0 AND 7 THEN ... ELSE NULL END) AS
> average_7day
> [, ...]
> FROM scope_of_source_with_aging
> GROUP BY ref_eod
> ;
>
> The first, and most important part, is to limit your source data to only the
> desired time period.  The rest of the above simply tries to limit how much
> work needs to be done.  Since window function, while great, are fairly slow
> if you can minimize their use - in this case by limiting them to simply
> calculating and age - it should help performance.
>
> The concept above should be applicable to your situation generally but you
> will need to clarify your needs and then modify the WHERE and window clauses
> as appropriate.
>
> I also don't recall if there is a native "first" function though you can
> either write one yourself or possibly add an appropriate range clause that
> will accomplish the same goal.
>
> David J.
>
>
>
>
>
>
> --
> View this message in context:
http://postgresql.nabble.com/Better-way-to-compute-moving-averages-tp5835291p5835295.html
> Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
>
>
David J.
    Thanks for the info.

1)  Yes, I pretty sure I need FOLLOWING because I ordered the eod_ts 
(timestamp) using DESC.

2)  The only functions I've created in the past have been primarily 
relating to queries that don't return any data.  Things such as "purge 
duplicate rows".  Nothing that used input and output; only to execute 
tasks on the backend.

Also, I'll look at the WITH method again.  I saw that in an earlier 
search, but didn't fully understand it and looked for another way.

In both cases, thanks for the reassurance and pointing me in a direction 
to explore.

Regards,
JA



pgsql-sql by date:

Previous
From: David G Johnston
Date:
Subject: Re: Better way to compute moving averages?
Next
From: Christopher Browne
Date:
Subject: Re: commit inside a function failing