Thread: obtain the difference between successive rows

obtain the difference between successive rows

From
ochaussavoine
Date:
Hi,
I have a table 'tmvt' with a field 'created' in the row, and would like to
compute the difference between successive rows. The solution I found is:

It is complicate and very long to perform. The problem could be simply
solved with MySql by creating a new field and updating it using a statement
with @.
I beleive this is a common problem for users. Do you know a simpler solution
with postgreSql?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/obtain-the-difference-between-successive-rows-tp5729149.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: obtain the difference between successive rows

From
Raymond O'Donnell
Date:
On 20/10/2012 11:54, ochaussavoine wrote:
> Hi,
> I have a table 'tmvt' with a field 'created' in the row, and would like to
> compute the difference between successive rows. The solution I found is:
>
> It is complicate and very long to perform. The problem could be simply
> solved with MySql by creating a new field and updating it using a statement
> with @.
> I beleive this is a common problem for users. Do you know a simpler solution
> with postgreSql?

I think you can do it with a window function.

http://www.postgresql.org/docs/9.2/static/tutorial-window.html
http://www.postgresql.org/docs/9.2/static/functions-window.html

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: obtain the difference between successive rows

From
Thalis Kalfigkopoulos
Date:
On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 20/10/2012 11:54, ochaussavoine wrote:
>> Hi,
>> I have a table 'tmvt' with a field 'created' in the row, and would like to
>> compute the difference between successive rows. The solution I found is:
>>
>
> I think you can do it with a window function.
>
> http://www.postgresql.org/docs/9.2/static/tutorial-window.html
> http://www.postgresql.org/docs/9.2/static/functions-window.html
>
> Ray.

In particular you're looking probably for the lag() window function.
For example if you have a timestamp column "ts" that's increasing
monotonically and you want to check the difference of each row's
timestamp with the chronologically previous row's timestamp you'd do
something like:
$ SELECT id, ts, lag(ts) OVER (order by ts) AS prev_ts FROM mytable;
This will display as third column the previous row's ts.

You may find reading this introduction to window fuctions useful:
https://www.pgcon.org/2009/schedule/attachments/98_Windowing%20Functions.pdf

best tregards,
Thalis


Re: obtain the difference between successive rows

From
Berend Tober
Date:
Thalis Kalfigkopoulos wrote:
> On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
>> On 20/10/2012 11:54, ochaussavoine wrote:
>>> I have a table 'tmvt' with a field 'created' in the row, and would like to
>>> compute the difference between successive rows. The solution I found is:
>>
>> I think you can do it with a window function.
>>
>
> In particular you're looking probably for the lag() window function.

What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:

CREATE TABLE electricity
(
   current_reading_date date,
   current_meter_reading integer
);


with sample data:


'2012-09-07',2158
'2012-10-05',3018



and I want an output such as:


Meter Read on October 5

Current      Previous          kWh
Reading         Reading          Used
-----------------------------------
3018   -    2158   =    860

Number service days = 28


I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.



Re: obtain the difference between successive rows

From
Chris Angelico
Date:
On Sun, Oct 21, 2012 at 2:30 AM, Berend Tober <btober@broadstripe.net> wrote:
> What about if there is more than one column you want the difference for (...
> coincidentally I am writing a article on this topic right now! ...), say a
> table which is used to record a metered quantity at not-quite regular
> intervals:
> ...
>
> I am working on a write-up of a neat solution using CTE's, but would be
> interested in other's views.

To be quite honest, I would simply read the table directly and then do
the processing in an application language :) But two window functions
should do the trick. Whether or not it's actually more efficient that
way is another question.

ChrisA


Re: obtain the difference between successive rows

From
Thalis Kalfigkopoulos
Date:
> What about if there is more than one column you want the difference for (...
> coincidentally I am writing a article on this topic right now! ...), say a
> table which is used to record a metered quantity at not-quite regular
> intervals:
>
> CREATE TABLE electricity
> (
>   current_reading_date date,
>   current_meter_reading integer
> );
>
>
> with sample data:
>
>
> '2012-09-07',2158
> '2012-10-05',3018
>
>
>
> and I want an output such as:
>
>
> Meter Read on October 5
>
> Current      Previous          kWh
> Reading      Reading          Used
> -----------------------------------
> 3018   -        2158   =        860
>
> Number service days = 28

No problem with that either.
$ SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
BY current_reading_date) AS kWh_diff, extract('days' FROM
current_reading_date - lag(current_reading_date) OVER(ORDER BY
current_reading_date)) as num_service_days FROM mytable;

Note how ORDER BY is in both cases done by current_reading_date. This
is because the current_reading_date defines the concept of
previous/next row whose values (either current_meter_reading or
current_reading_date) I want to be comparing.

regards,
Thalis


Re: obtain the difference between successive rows

From
Berend Tober
Date:
Thalis Kalfigkopoulos wrote:
> SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
> BY current_reading_date) AS kWh_diff, extract('days' FROM
> current_reading_date - lag(current_reading_date) OVER(ORDER BY
> current_reading_date)) as num_service_days FROM mytable;

How would you get the previous reading (and perhaps the previous
read date) to also appear on the same output row? The sample
table with the subtraction I showed for illustration is literally
what is printed on the bill ... they are not just presenting the
quantity used and the number of days, but actually the dates and
meter readings used to do the arithmetic.




Re: obtain the difference between successive rows

From
Raymond O'Donnell
Date:
On 20/10/2012 17:02, Berend Tober wrote:
> Thalis Kalfigkopoulos wrote:
>> SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER
>> BY current_reading_date) AS kWh_diff, extract('days' FROM
>> current_reading_date - lag(current_reading_date) OVER(ORDER BY
>> current_reading_date)) as num_service_days FROM mytable;
>
> How would you get the previous reading (and perhaps the previous read
> date) to also appear on the same output row? The sample table with the
> subtraction I showed for illustration is literally what is printed on
> the bill ... they are not just presenting the quantity used and the
> number of days, but actually the dates and meter readings used to do the
> arithmetic.

Just include them in the SELECT:

SELECT
  lag(current_meter_reading) OVER(ORDER BY current_reading_date) AS
kWh_prev,
  current_meter_reading - lag(current_meter_reading) OVER(ORDER BY
current_reading_date) AS kWh_diff,

  (...etc...)


Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: obtain the difference between successive rows

From
Tom Lane
Date:
Chris Angelico <rosuav@gmail.com> writes:
> To be quite honest, I would simply read the table directly and then do
> the processing in an application language :) But two window functions
> should do the trick. Whether or not it's actually more efficient that
> way is another question.

FWIW, Postgres is reasonably smart about the case of multiple window
functions with identical window definitions --- once you've got one
lag() in the query, adding more isn't going to cost much.

Having said that, they are pretty expensive.  I tend to agree that doing
the processing on the application side might be faster --- but only if
you've got a place to put such code there.  If you've just got generic
query-result display code, there may not be any convenient way to do it.

            regards, tom lane


Re: obtain the difference between successive rows

From
Raymond O'Donnell
Date:
On 20/10/2012 17:23, Tom Lane wrote:
> Chris Angelico <rosuav@gmail.com> writes:
>> To be quite honest, I would simply read the table directly and then do
>> the processing in an application language :) But two window functions
>> should do the trick. Whether or not it's actually more efficient that
>> way is another question.
>
> FWIW, Postgres is reasonably smart about the case of multiple window
> functions with identical window definitions --- once you've got one
> lag() in the query, adding more isn't going to cost much.

Out of curiosity, would there be much difference between having multiple
lag()s in the SELECT and a single one in a CTE?

> Having said that, they are pretty expensive.  I tend to agree that doing
> the processing on the application side might be faster --- but only if
> you've got a place to put such code there.  If you've just got generic

True, assuming that you're working with a language that handles dates
well... I do a lot of PHP, and have found that it's generally safer to
handle date arithmetic in Postges.

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: obtain the difference between successive rows

From
Berend Tober
Date:
Raymond O'Donnell wrote:
> On 20/10/2012 17:02, Berend Tober wrote:
>> Thalis Kalfigkopoulos wrote:
>> How would you get the previous reading (and perhaps the previous read
>> date) to also appear ...
>
> Just include them in the SELECT:
>


Well, that is surprisingly easy!

How about this then: the table includes data for more than one
meter. I moved (I've been keeping this data for two decades ...
yes, I know...) to a new house, and in the new house, the utility
company has replaced the meter (one of those "smart" meters). So
the table has a foreign key reference to the primary key
identifying the meter:

CREATE TABLE electricity
(
   electric_meter_pk integer,
   current_reading_date date,
   current_meter_reading integer
);

with sample data:

  2 | 1997-04-14   |    0
  2 | 1997-05-08   |  573
  2 | 1997-06-12   | 1709
  ...
  2 | 2009-09-14   |152941
  3 | 2009-06-26   | 68502
  3 | 2009-08-13   | 69738
...
  3 | 2012-07-06   |118953
  3 | 2012-07-18   |119185
  4 | 2012-07-18   |     0
  4 | 2012-08-06   |   887
  4 | 2012-09-07   |  2158
  4 | 2012-10-05   |  3018


Your suggestion almost worked as is for this, except that you
have to note that reading for meter #2 and meter #3 overlap (I
briefly owned two houses), and that seemed to confuse the lag()
function:

SELECT
   electric_meter_pk,
   lag(reading_date)
      OVER(ORDER BY reading_date) as prev_date,
   reading_date,
   lag(meter_reading)
      OVER(ORDER BY reading_date) AS prev_reading,
   meter_reading,
   meter_reading - lag(meter_reading)
      OVER(ORDER BY reading_date) AS kWh_diff,
   reading_date - lag(reading_date)
      OVER(ORDER BY reading_date) as num_service_days
FROM electric
order by 1,3;

  2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 |  1179 |32
  2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 |  1365 |30
  2 | 2009-06-26 | 2009-07-14 |  68502 | 149808 | 81306 |18
  2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 |  1776 |29
  2 | 2009-09-12 | 2009-09-14 |  70934 | 152941 | 82007 | 2
  3 | 2009-06-10 | 2009-06-26 | 148139 |  68502 |-79637 |16
  3 | 2009-08-12 | 2009-08-13 | 151584 |  69738 |-81846 | 1
  3 | 2009-08-13 | 2009-09-12 |  69738 |  70934 |  1196 |30
  3 | 2009-09-14 | 2009-10-14 | 152941 |  71918 |-81023 |30
  3 | 2009-10-14 | 2009-11-11 |  71918 |  72952 |  1034 |28




Re: obtain the difference between successive rows

From
Chris Angelico
Date:
On Sun, Oct 21, 2012 at 3:29 AM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 20/10/2012 17:23, Tom Lane wrote:
>> Having said that, they are pretty expensive.  I tend to agree that doing
>> the processing on the application side might be faster --- but only if
>> you've got a place to put such code there.  If you've just got generic
>
> True, assuming that you're working with a language that handles dates
> well... I do a lot of PHP, and have found that it's generally safer to
> handle date arithmetic in Postges.

That's because PHP is a sucky language :)

For a job like this, I'd probably whip up something in Pike. Excellent
string handling, easy database access (including an efficient
implementation of the pgsql protocol - skips the usual underlying
library and talks directly to the server), and the best Unicode
support I've seen in any application language (now equalled by Python,
as of version 3.3 that just came out). Biggest downside is that it's a
tad obscure.

object db=Sql.Sql("pgsql://username:password@hostname/database");
db->query("select blah from blah");

ChrisA


Re: obtain the difference between successive rows

From
Tom Lane
Date:
"Raymond O'Donnell" <rod@iol.ie> writes:
> On 20/10/2012 17:23, Tom Lane wrote:
>> FWIW, Postgres is reasonably smart about the case of multiple window
>> functions with identical window definitions --- once you've got one
>> lag() in the query, adding more isn't going to cost much.

> Out of curiosity, would there be much difference between having multiple
> lag()s in the SELECT and a single one in a CTE?

Not sure what you're proposing?  I don't see how you'd solve this
problem with a CTE, at least not without a join, which seems unlikely
to be a win.

            regards, tom lane


Re: obtain the difference between successive rows

From
Berend Tober
Date:
Berend Tober wrote:
> Raymond O'Donnell wrote:
>> On 20/10/2012 17:02, Berend Tober wrote:
>>> Thalis Kalfigkopoulos wrote:
>>> How would you get the previous reading (and perhaps the
>>> previous read
>>> date) to also appear ...
>>
>> Just include them in the SELECT:
>
> Well, that is surprisingly easy!
>
> How about this then: the table includes data for more than one
> meter....


Almost answering my own question. Adding the meter key to the lag:

SELECT
   electric_meter_pk,
   lag(reading_date)
      OVER(ORDER BY electric_meter_pk,reading_date)
      as prev_date,
   reading_date,
   lag(meter_reading)
      OVER(ORDER BY electric_meter_pk,reading_date)
      AS prev_reading,
   meter_reading,
   meter_reading - lag(meter_reading)
      OVER(ORDER BY electric_meter_pk,reading_date)
      AS kWh_diff,
   reading_date - lag(reading_date)
      OVER(ORDER BY electric_meter_pk,reading_date)
      as num_service_days
FROM home.electric
order by 1,3;

Gives all good as far as lining up dates, except it does not
cross the new-meter boundary gracefully:

  2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 |    1776 | 29
  2 | 2009-08-12 | 2009-09-14 | 151584 | 152941 |    1357 | 33
*3 | 2009-09-14 | 2009-06-26 | 152941 |  68502 |  -84439 |-80
  3 | 2009-06-26 | 2009-08-13 |  68502 |  69738 |    1236 | 48
  3 | 2009-08-13 | 2009-09-12 |  69738 |  70934 |    1196 | 30
  ...
  3 | 2012-05-04 | 2012-06-07 | 116091 | 117469 |    1378 | 34
  3 | 2012-06-07 | 2012-07-06 | 117469 | 118953 |    1484 | 29
  3 | 2012-07-06 | 2012-07-18 | 118953 | 119185 |     232 | 12
*4 | 2012-07-18 | 2012-07-18 | 119185 |      0 | -119185 |  0
  4 | 2012-07-18 | 2012-08-06 |      0 |    887 |     887 | 19
  4 | 2012-08-06 | 2012-09-07 |    887 |   2158 |    1271 | 32
  4 | 2012-09-07 | 2012-10-05 |   2158 |   3018 |     860 | 28


The first-row-initialization problem is what lead me to consider
a recursive CTE. I have something that works and does not use
window functions, but I think it requires more detailed
explanation than I have prepared at this time.


Re: obtain the difference between successive rows

From
Raymond O'Donnell
Date:
On 20/10/2012 17:50, Tom Lane wrote:
> "Raymond O'Donnell" <rod@iol.ie> writes:
>> On 20/10/2012 17:23, Tom Lane wrote:
>>> FWIW, Postgres is reasonably smart about the case of multiple window
>>> functions with identical window definitions --- once you've got one
>>> lag() in the query, adding more isn't going to cost much.
>
>> Out of curiosity, would there be much difference between having multiple
>> lag()s in the SELECT and a single one in a CTE?
>
> Not sure what you're proposing?  I don't see how you'd solve this
> problem with a CTE, at least not without a join, which seems unlikely
> to be a win.

Yes, I see what you mean.... was waving my hands a bit. :-)

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: obtain the difference between successive rows

From
Виктор Егоров
Date:
2012/10/20 Berend Tober <btober@broadstripe.net>:
> Your suggestion almost worked as is for this, except that you have to note
> that reading for meter #2 and meter #3 overlap (I briefly owned two houses),
> and that seemed to confuse the lag() function:
>
> SELECT
>   electric_meter_pk,
>   lag(reading_date)
>      OVER(ORDER BY reading_date) as prev_date,
>   reading_date,
>   lag(meter_reading)
>      OVER(ORDER BY reading_date) AS prev_reading,
>   meter_reading,
>   meter_reading - lag(meter_reading)
>      OVER(ORDER BY reading_date) AS kWh_diff,
>   reading_date - lag(reading_date)
>      OVER(ORDER BY reading_date) as num_service_days
> FROM electric
> order by 1,3;
>
>  2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 |  1179 |32
>  2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 |  1365 |30
>  2 | 2009-06-26 | 2009-07-14 |  68502 | 149808 | 81306 |18
>  2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 |  1776 |29
>  2 | 2009-09-12 | 2009-09-14 |  70934 | 152941 | 82007 | 2
>  3 | 2009-06-10 | 2009-06-26 | 148139 |  68502 |-79637 |16
>  3 | 2009-08-12 | 2009-08-13 | 151584 |  69738 |-81846 | 1
>  3 | 2009-08-13 | 2009-09-12 |  69738 |  70934 |  1196 |30
>  3 | 2009-09-14 | 2009-10-14 | 152941 |  71918 |-81023 |30
>  3 | 2009-10-14 | 2009-11-11 |  71918 |  72952 |  1034 |28

You can do
 … OVER(PARTITION BY electric_meter_pk ORDER BY reading_date)
to split you data by meter.

--
Victor Y. Yegorov


Re: obtain the difference between successive rows

From
Berend Tober
Date:
Виктор Егоров wrote:
> 2012/10/20 Berend Tober <btober@broadstripe.net>:
>> Your suggestion almost worked as is for this, except that you have to note
>> that reading for meter #2 and meter #3 overlap ...
> You can do
>   … OVER(PARTITION BY electric_meter_pk ORDER BY reading_date)
> to split you data by meter.
>


That looks like it works great! Much simpler-looking SQL than what I was
working on.

Thanks!


Re: obtain the difference between successive rows

From
Berend Tober
Date:
Thalis Kalfigkopoulos wrote:
> On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
>> On 20/10/2012 11:54, ochaussavoine wrote:
>>> I have a table 'tmvt' with a field 'created' in the row, and would like to
>>> compute the difference between successive rows. The solution I found is:
>>
>> I think you can do it with a window function.
>>
>
> In particular you're looking probably for the lag() window function.

What about if there is more than one column you want the
difference for (... coincidentally I am writing a article on this
topic right now! ...), say a table which is used to record a
metered quantity at not-quite regular intervals:

CREATE TABLE electricity
(
   current_reading_date date,
   current_meter_reading integer
);


with sample data:


'2012-09-07',2158
'2012-10-05',3018



and I want an output such as:


Meter Read on October 5

Current      Previous          kWh
Reading         Reading          Used
-----------------------------------
3018   -    2158   =    860

Number service days = 28


I am working on a write-up of a neat solution using CTE's, but
would be interested in other's views.



Re: obtain the difference between successive rows

From
Jasen Betts
Date:
On 2012-10-20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Raymond O'Donnell" <rod@iol.ie> writes:
>> On 20/10/2012 17:23, Tom Lane wrote:
>>> FWIW, Postgres is reasonably smart about the case of multiple window
>>> functions with identical window definitions --- once you've got one
>>> lag() in the query, adding more isn't going to cost much.
>
>> Out of curiosity, would there be much difference between having multiple
>> lag()s in the SELECT and a single one in a CTE?
>
> Not sure what you're proposing?  I don't see how you'd solve this
> problem with a CTE, at least not without a join, which seems unlikely
> to be a win.


select generate_series(1,20)              as a ,
      ( generate_series(1,20)*9+random()*7)::int as b
  into temp table foo;
delete from foo where random()>0.5;

with
 j as ( select f,lag(f) over (order by f.a)as g from foo as f )
 select (f).*,(g).a as "lag a",(g).b as "lag b",
 (f).a-(g).a as "diff(a)" ,(f).b-(g).b as "diff(b)" from j;

--
⚂⚃ 100% natural

Re: obtain the difference between successive rows

From
Jasen Betts
Date:
On 2012-10-20, Berend Tober <btober@broadstripe.net> wrote:
> Thalis Kalfigkopoulos wrote:
>> On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote:
>>> On 20/10/2012 11:54, ochaussavoine wrote:
>>>> I have a table 'tmvt' with a field 'created' in the row, and would like to
>>>> compute the difference between successive rows. The solution I found is:
>>>
>>> I think you can do it with a window function.
>>>
>>
>> In particular you're looking probably for the lag() window function.
>
> What about if there is more than one column you want the
> difference for (... coincidentally I am writing a article on this
> topic right now! ...), say a table which is used to record a
> metered quantity at not-quite regular intervals:
>
> CREATE TABLE electricity
> (
>    current_reading_date date,
>    current_meter_reading integer
> );
>
>
> with sample data:
>
>
> '2012-09-07',2158
> '2012-10-05',3018
>
>
>
> and I want an output such as:
>
>
> Meter Read on October 5
>
> Current      Previous          kWh
> Reading         Reading          Used
> -----------------------------------
> 3018   -    2158   =    860
>
> Number service days = 28
>
>
> I am working on a write-up of a neat solution using CTE's, but
> would be interested in other's views.

electricity meter may bis a bad example as usage meters often have
fewer digits than are needed to track all historical usage

eg:

 '2012-05-07',997743
 '2012-06-06',999601
 '2012-07-05',000338
 '2012-08-06',001290
 '2012-09-07',002158
 '2012-10-05',003018


--
⚂⚃ 100% natural

Re: obtain the difference between successive rows

From
"Kevin Grittner"
Date:
Jasen Betts wrote:

> electricity meter may bis a bad example as usage meters often have
> fewer digits than are needed to track all historical usage
>
> eg:
>
>  '2012-05-07',997743
>  '2012-06-06',999601
>  '2012-07-05',000338
>  '2012-08-06',001290
>  '2012-09-07',002158
>  '2012-10-05',003018

Wrap-around can be handled pretty easily. It's meter replacement
that is a challenge.  :-)

SELECT
    current_reading_date as "reading date",
    lag(current_meter_reading, 1)
      over (order by current_reading_date) as "prior reading",
    current_meter_reading as "current reading",
    (1000000000 + current_meter_reading
      - lag(current_meter_reading, 1)
          over (order by current_reading_date)) % 1000000 as usage
  from electricity;

 reading date | prior reading | current reading | usage
--------------+---------------+-----------------+-------
 2012-05-07   |               |          997743 |      
 2012-06-06   |        997743 |          999601 |  1858
 2012-07-05   |        999601 |             338 |   737
 2012-08-06   |           338 |            1290 |   952
 2012-09-07   |          1290 |            2158 |   868
 2012-10-05   |          2158 |            3018 |   860
(6 rows)

-Kevin