Thread: obtain the difference between successive rows
Hi, I have a table 'tmvt' with a field 'created' in the row, and would like to compute the difference between successive rows. The solution I found is: It is complicate and very long to perform. The problem could be simply solved with MySql by creating a new field and updating it using a statement with @. I beleive this is a common problem for users. Do you know a simpler solution with postgreSql? -- View this message in context: http://postgresql.1045698.n5.nabble.com/obtain-the-difference-between-successive-rows-tp5729149.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 20/10/2012 11:54, ochaussavoine wrote: > Hi, > I have a table 'tmvt' with a field 'created' in the row, and would like to > compute the difference between successive rows. The solution I found is: > > It is complicate and very long to perform. The problem could be simply > solved with MySql by creating a new field and updating it using a statement > with @. > I beleive this is a common problem for users. Do you know a simpler solution > with postgreSql? I think you can do it with a window function. http://www.postgresql.org/docs/9.2/static/tutorial-window.html http://www.postgresql.org/docs/9.2/static/functions-window.html Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote: > On 20/10/2012 11:54, ochaussavoine wrote: >> Hi, >> I have a table 'tmvt' with a field 'created' in the row, and would like to >> compute the difference between successive rows. The solution I found is: >> > > I think you can do it with a window function. > > http://www.postgresql.org/docs/9.2/static/tutorial-window.html > http://www.postgresql.org/docs/9.2/static/functions-window.html > > Ray. In particular you're looking probably for the lag() window function. For example if you have a timestamp column "ts" that's increasing monotonically and you want to check the difference of each row's timestamp with the chronologically previous row's timestamp you'd do something like: $ SELECT id, ts, lag(ts) OVER (order by ts) AS prev_ts FROM mytable; This will display as third column the previous row's ts. You may find reading this introduction to window fuctions useful: https://www.pgcon.org/2009/schedule/attachments/98_Windowing%20Functions.pdf best tregards, Thalis
Thalis Kalfigkopoulos wrote: > On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote: >> On 20/10/2012 11:54, ochaussavoine wrote: >>> I have a table 'tmvt' with a field 'created' in the row, and would like to >>> compute the difference between successive rows. The solution I found is: >> >> I think you can do it with a window function. >> > > In particular you're looking probably for the lag() window function. What about if there is more than one column you want the difference for (... coincidentally I am writing a article on this topic right now! ...), say a table which is used to record a metered quantity at not-quite regular intervals: CREATE TABLE electricity ( current_reading_date date, current_meter_reading integer ); with sample data: '2012-09-07',2158 '2012-10-05',3018 and I want an output such as: Meter Read on October 5 Current Previous kWh Reading Reading Used ----------------------------------- 3018 - 2158 = 860 Number service days = 28 I am working on a write-up of a neat solution using CTE's, but would be interested in other's views.
On Sun, Oct 21, 2012 at 2:30 AM, Berend Tober <btober@broadstripe.net> wrote: > What about if there is more than one column you want the difference for (... > coincidentally I am writing a article on this topic right now! ...), say a > table which is used to record a metered quantity at not-quite regular > intervals: > ... > > I am working on a write-up of a neat solution using CTE's, but would be > interested in other's views. To be quite honest, I would simply read the table directly and then do the processing in an application language :) But two window functions should do the trick. Whether or not it's actually more efficient that way is another question. ChrisA
> What about if there is more than one column you want the difference for (... > coincidentally I am writing a article on this topic right now! ...), say a > table which is used to record a metered quantity at not-quite regular > intervals: > > CREATE TABLE electricity > ( > current_reading_date date, > current_meter_reading integer > ); > > > with sample data: > > > '2012-09-07',2158 > '2012-10-05',3018 > > > > and I want an output such as: > > > Meter Read on October 5 > > Current Previous kWh > Reading Reading Used > ----------------------------------- > 3018 - 2158 = 860 > > Number service days = 28 No problem with that either. $ SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER BY current_reading_date) AS kWh_diff, extract('days' FROM current_reading_date - lag(current_reading_date) OVER(ORDER BY current_reading_date)) as num_service_days FROM mytable; Note how ORDER BY is in both cases done by current_reading_date. This is because the current_reading_date defines the concept of previous/next row whose values (either current_meter_reading or current_reading_date) I want to be comparing. regards, Thalis
Thalis Kalfigkopoulos wrote: > SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER > BY current_reading_date) AS kWh_diff, extract('days' FROM > current_reading_date - lag(current_reading_date) OVER(ORDER BY > current_reading_date)) as num_service_days FROM mytable; How would you get the previous reading (and perhaps the previous read date) to also appear on the same output row? The sample table with the subtraction I showed for illustration is literally what is printed on the bill ... they are not just presenting the quantity used and the number of days, but actually the dates and meter readings used to do the arithmetic.
On 20/10/2012 17:02, Berend Tober wrote: > Thalis Kalfigkopoulos wrote: >> SELECT current_meter_reading - lag(current_meter_reading) OVER(ORDER >> BY current_reading_date) AS kWh_diff, extract('days' FROM >> current_reading_date - lag(current_reading_date) OVER(ORDER BY >> current_reading_date)) as num_service_days FROM mytable; > > How would you get the previous reading (and perhaps the previous read > date) to also appear on the same output row? The sample table with the > subtraction I showed for illustration is literally what is printed on > the bill ... they are not just presenting the quantity used and the > number of days, but actually the dates and meter readings used to do the > arithmetic. Just include them in the SELECT: SELECT lag(current_meter_reading) OVER(ORDER BY current_reading_date) AS kWh_prev, current_meter_reading - lag(current_meter_reading) OVER(ORDER BY current_reading_date) AS kWh_diff, (...etc...) Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
Chris Angelico <rosuav@gmail.com> writes: > To be quite honest, I would simply read the table directly and then do > the processing in an application language :) But two window functions > should do the trick. Whether or not it's actually more efficient that > way is another question. FWIW, Postgres is reasonably smart about the case of multiple window functions with identical window definitions --- once you've got one lag() in the query, adding more isn't going to cost much. Having said that, they are pretty expensive. I tend to agree that doing the processing on the application side might be faster --- but only if you've got a place to put such code there. If you've just got generic query-result display code, there may not be any convenient way to do it. regards, tom lane
On 20/10/2012 17:23, Tom Lane wrote: > Chris Angelico <rosuav@gmail.com> writes: >> To be quite honest, I would simply read the table directly and then do >> the processing in an application language :) But two window functions >> should do the trick. Whether or not it's actually more efficient that >> way is another question. > > FWIW, Postgres is reasonably smart about the case of multiple window > functions with identical window definitions --- once you've got one > lag() in the query, adding more isn't going to cost much. Out of curiosity, would there be much difference between having multiple lag()s in the SELECT and a single one in a CTE? > Having said that, they are pretty expensive. I tend to agree that doing > the processing on the application side might be faster --- but only if > you've got a place to put such code there. If you've just got generic True, assuming that you're working with a language that handles dates well... I do a lot of PHP, and have found that it's generally safer to handle date arithmetic in Postges. Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
Raymond O'Donnell wrote: > On 20/10/2012 17:02, Berend Tober wrote: >> Thalis Kalfigkopoulos wrote: >> How would you get the previous reading (and perhaps the previous read >> date) to also appear ... > > Just include them in the SELECT: > Well, that is surprisingly easy! How about this then: the table includes data for more than one meter. I moved (I've been keeping this data for two decades ... yes, I know...) to a new house, and in the new house, the utility company has replaced the meter (one of those "smart" meters). So the table has a foreign key reference to the primary key identifying the meter: CREATE TABLE electricity ( electric_meter_pk integer, current_reading_date date, current_meter_reading integer ); with sample data: 2 | 1997-04-14 | 0 2 | 1997-05-08 | 573 2 | 1997-06-12 | 1709 ... 2 | 2009-09-14 |152941 3 | 2009-06-26 | 68502 3 | 2009-08-13 | 69738 ... 3 | 2012-07-06 |118953 3 | 2012-07-18 |119185 4 | 2012-07-18 | 0 4 | 2012-08-06 | 887 4 | 2012-09-07 | 2158 4 | 2012-10-05 | 3018 Your suggestion almost worked as is for this, except that you have to note that reading for meter #2 and meter #3 overlap (I briefly owned two houses), and that seemed to confuse the lag() function: SELECT electric_meter_pk, lag(reading_date) OVER(ORDER BY reading_date) as prev_date, reading_date, lag(meter_reading) OVER(ORDER BY reading_date) AS prev_reading, meter_reading, meter_reading - lag(meter_reading) OVER(ORDER BY reading_date) AS kWh_diff, reading_date - lag(reading_date) OVER(ORDER BY reading_date) as num_service_days FROM electric order by 1,3; 2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 | 1179 |32 2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 | 1365 |30 2 | 2009-06-26 | 2009-07-14 | 68502 | 149808 | 81306 |18 2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 |29 2 | 2009-09-12 | 2009-09-14 | 70934 | 152941 | 82007 | 2 3 | 2009-06-10 | 2009-06-26 | 148139 | 68502 |-79637 |16 3 | 2009-08-12 | 2009-08-13 | 151584 | 69738 |-81846 | 1 3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 |30 3 | 2009-09-14 | 2009-10-14 | 152941 | 71918 |-81023 |30 3 | 2009-10-14 | 2009-11-11 | 71918 | 72952 | 1034 |28
On Sun, Oct 21, 2012 at 3:29 AM, Raymond O'Donnell <rod@iol.ie> wrote: > On 20/10/2012 17:23, Tom Lane wrote: >> Having said that, they are pretty expensive. I tend to agree that doing >> the processing on the application side might be faster --- but only if >> you've got a place to put such code there. If you've just got generic > > True, assuming that you're working with a language that handles dates > well... I do a lot of PHP, and have found that it's generally safer to > handle date arithmetic in Postges. That's because PHP is a sucky language :) For a job like this, I'd probably whip up something in Pike. Excellent string handling, easy database access (including an efficient implementation of the pgsql protocol - skips the usual underlying library and talks directly to the server), and the best Unicode support I've seen in any application language (now equalled by Python, as of version 3.3 that just came out). Biggest downside is that it's a tad obscure. object db=Sql.Sql("pgsql://username:password@hostname/database"); db->query("select blah from blah"); ChrisA
"Raymond O'Donnell" <rod@iol.ie> writes: > On 20/10/2012 17:23, Tom Lane wrote: >> FWIW, Postgres is reasonably smart about the case of multiple window >> functions with identical window definitions --- once you've got one >> lag() in the query, adding more isn't going to cost much. > Out of curiosity, would there be much difference between having multiple > lag()s in the SELECT and a single one in a CTE? Not sure what you're proposing? I don't see how you'd solve this problem with a CTE, at least not without a join, which seems unlikely to be a win. regards, tom lane
Berend Tober wrote: > Raymond O'Donnell wrote: >> On 20/10/2012 17:02, Berend Tober wrote: >>> Thalis Kalfigkopoulos wrote: >>> How would you get the previous reading (and perhaps the >>> previous read >>> date) to also appear ... >> >> Just include them in the SELECT: > > Well, that is surprisingly easy! > > How about this then: the table includes data for more than one > meter.... Almost answering my own question. Adding the meter key to the lag: SELECT electric_meter_pk, lag(reading_date) OVER(ORDER BY electric_meter_pk,reading_date) as prev_date, reading_date, lag(meter_reading) OVER(ORDER BY electric_meter_pk,reading_date) AS prev_reading, meter_reading, meter_reading - lag(meter_reading) OVER(ORDER BY electric_meter_pk,reading_date) AS kWh_diff, reading_date - lag(reading_date) OVER(ORDER BY electric_meter_pk,reading_date) as num_service_days FROM home.electric order by 1,3; Gives all good as far as lining up dates, except it does not cross the new-meter boundary gracefully: 2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 | 29 2 | 2009-08-12 | 2009-09-14 | 151584 | 152941 | 1357 | 33 *3 | 2009-09-14 | 2009-06-26 | 152941 | 68502 | -84439 |-80 3 | 2009-06-26 | 2009-08-13 | 68502 | 69738 | 1236 | 48 3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 | 30 ... 3 | 2012-05-04 | 2012-06-07 | 116091 | 117469 | 1378 | 34 3 | 2012-06-07 | 2012-07-06 | 117469 | 118953 | 1484 | 29 3 | 2012-07-06 | 2012-07-18 | 118953 | 119185 | 232 | 12 *4 | 2012-07-18 | 2012-07-18 | 119185 | 0 | -119185 | 0 4 | 2012-07-18 | 2012-08-06 | 0 | 887 | 887 | 19 4 | 2012-08-06 | 2012-09-07 | 887 | 2158 | 1271 | 32 4 | 2012-09-07 | 2012-10-05 | 2158 | 3018 | 860 | 28 The first-row-initialization problem is what lead me to consider a recursive CTE. I have something that works and does not use window functions, but I think it requires more detailed explanation than I have prepared at this time.
On 20/10/2012 17:50, Tom Lane wrote: > "Raymond O'Donnell" <rod@iol.ie> writes: >> On 20/10/2012 17:23, Tom Lane wrote: >>> FWIW, Postgres is reasonably smart about the case of multiple window >>> functions with identical window definitions --- once you've got one >>> lag() in the query, adding more isn't going to cost much. > >> Out of curiosity, would there be much difference between having multiple >> lag()s in the SELECT and a single one in a CTE? > > Not sure what you're proposing? I don't see how you'd solve this > problem with a CTE, at least not without a join, which seems unlikely > to be a win. Yes, I see what you mean.... was waving my hands a bit. :-) Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
2012/10/20 Berend Tober <btober@broadstripe.net>: > Your suggestion almost worked as is for this, except that you have to note > that reading for meter #2 and meter #3 overlap (I briefly owned two houses), > and that seemed to confuse the lag() function: > > SELECT > electric_meter_pk, > lag(reading_date) > OVER(ORDER BY reading_date) as prev_date, > reading_date, > lag(meter_reading) > OVER(ORDER BY reading_date) AS prev_reading, > meter_reading, > meter_reading - lag(meter_reading) > OVER(ORDER BY reading_date) AS kWh_diff, > reading_date - lag(reading_date) > OVER(ORDER BY reading_date) as num_service_days > FROM electric > order by 1,3; > > 2 | 2009-04-09 | 2009-05-11 | 145595 | 146774 | 1179 |32 > 2 | 2009-05-11 | 2009-06-10 | 146774 | 148139 | 1365 |30 > 2 | 2009-06-26 | 2009-07-14 | 68502 | 149808 | 81306 |18 > 2 | 2009-07-14 | 2009-08-12 | 149808 | 151584 | 1776 |29 > 2 | 2009-09-12 | 2009-09-14 | 70934 | 152941 | 82007 | 2 > 3 | 2009-06-10 | 2009-06-26 | 148139 | 68502 |-79637 |16 > 3 | 2009-08-12 | 2009-08-13 | 151584 | 69738 |-81846 | 1 > 3 | 2009-08-13 | 2009-09-12 | 69738 | 70934 | 1196 |30 > 3 | 2009-09-14 | 2009-10-14 | 152941 | 71918 |-81023 |30 > 3 | 2009-10-14 | 2009-11-11 | 71918 | 72952 | 1034 |28 You can do … OVER(PARTITION BY electric_meter_pk ORDER BY reading_date) to split you data by meter. -- Victor Y. Yegorov
Виктор Егоров wrote: > 2012/10/20 Berend Tober <btober@broadstripe.net>: >> Your suggestion almost worked as is for this, except that you have to note >> that reading for meter #2 and meter #3 overlap ... > You can do > … OVER(PARTITION BY electric_meter_pk ORDER BY reading_date) > to split you data by meter. > That looks like it works great! Much simpler-looking SQL than what I was working on. Thanks!
Thalis Kalfigkopoulos wrote: > On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote: >> On 20/10/2012 11:54, ochaussavoine wrote: >>> I have a table 'tmvt' with a field 'created' in the row, and would like to >>> compute the difference between successive rows. The solution I found is: >> >> I think you can do it with a window function. >> > > In particular you're looking probably for the lag() window function. What about if there is more than one column you want the difference for (... coincidentally I am writing a article on this topic right now! ...), say a table which is used to record a metered quantity at not-quite regular intervals: CREATE TABLE electricity ( current_reading_date date, current_meter_reading integer ); with sample data: '2012-09-07',2158 '2012-10-05',3018 and I want an output such as: Meter Read on October 5 Current Previous kWh Reading Reading Used ----------------------------------- 3018 - 2158 = 860 Number service days = 28 I am working on a write-up of a neat solution using CTE's, but would be interested in other's views.
On 2012-10-20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Raymond O'Donnell" <rod@iol.ie> writes: >> On 20/10/2012 17:23, Tom Lane wrote: >>> FWIW, Postgres is reasonably smart about the case of multiple window >>> functions with identical window definitions --- once you've got one >>> lag() in the query, adding more isn't going to cost much. > >> Out of curiosity, would there be much difference between having multiple >> lag()s in the SELECT and a single one in a CTE? > > Not sure what you're proposing? I don't see how you'd solve this > problem with a CTE, at least not without a join, which seems unlikely > to be a win. select generate_series(1,20) as a , ( generate_series(1,20)*9+random()*7)::int as b into temp table foo; delete from foo where random()>0.5; with j as ( select f,lag(f) over (order by f.a)as g from foo as f ) select (f).*,(g).a as "lag a",(g).b as "lag b", (f).a-(g).a as "diff(a)" ,(f).b-(g).b as "diff(b)" from j; -- ⚂⚃ 100% natural
On 2012-10-20, Berend Tober <btober@broadstripe.net> wrote: > Thalis Kalfigkopoulos wrote: >> On Sat, Oct 20, 2012 at 8:02 AM, Raymond O'Donnell <rod@iol.ie> wrote: >>> On 20/10/2012 11:54, ochaussavoine wrote: >>>> I have a table 'tmvt' with a field 'created' in the row, and would like to >>>> compute the difference between successive rows. The solution I found is: >>> >>> I think you can do it with a window function. >>> >> >> In particular you're looking probably for the lag() window function. > > What about if there is more than one column you want the > difference for (... coincidentally I am writing a article on this > topic right now! ...), say a table which is used to record a > metered quantity at not-quite regular intervals: > > CREATE TABLE electricity > ( > current_reading_date date, > current_meter_reading integer > ); > > > with sample data: > > > '2012-09-07',2158 > '2012-10-05',3018 > > > > and I want an output such as: > > > Meter Read on October 5 > > Current Previous kWh > Reading Reading Used > ----------------------------------- > 3018 - 2158 = 860 > > Number service days = 28 > > > I am working on a write-up of a neat solution using CTE's, but > would be interested in other's views. electricity meter may bis a bad example as usage meters often have fewer digits than are needed to track all historical usage eg: '2012-05-07',997743 '2012-06-06',999601 '2012-07-05',000338 '2012-08-06',001290 '2012-09-07',002158 '2012-10-05',003018 -- ⚂⚃ 100% natural
Jasen Betts wrote: > electricity meter may bis a bad example as usage meters often have > fewer digits than are needed to track all historical usage > > eg: > > '2012-05-07',997743 > '2012-06-06',999601 > '2012-07-05',000338 > '2012-08-06',001290 > '2012-09-07',002158 > '2012-10-05',003018 Wrap-around can be handled pretty easily. It's meter replacement that is a challenge. :-) SELECT current_reading_date as "reading date", lag(current_meter_reading, 1) over (order by current_reading_date) as "prior reading", current_meter_reading as "current reading", (1000000000 + current_meter_reading - lag(current_meter_reading, 1) over (order by current_reading_date)) % 1000000 as usage from electricity; reading date | prior reading | current reading | usage --------------+---------------+-----------------+------- 2012-05-07 | | 997743 | 2012-06-06 | 997743 | 999601 | 1858 2012-07-05 | 999601 | 338 | 737 2012-08-06 | 338 | 1290 | 952 2012-09-07 | 1290 | 2158 | 868 2012-10-05 | 2158 | 3018 | 860 (6 rows) -Kevin