Thread: update from multiple rows
hi I have some trouble updating a table like this one : date | data_raw | data_sys 12-01 | 5 | 4.5 13-01 | 6 | 6 14-01 | 7 | 8 I would like to update the 'data_sys' row by computing values of multiple 'data_raw' values. I mean for example :data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] + data_raw['14-01'] )/3; I thought of a function that fetch the 3 data_raw rows for each rows.... but it was obviously too much slow... Is there a more efficient way to achieve this ? Thanks in advance.. This could help me very much.. Etienne Adam _________________________________________________________________ 無料メールならやっぱり 「MSN Hotmail」 http://www.hotmail.com/
On Sat, Jan 22, 2005 at 12:51:20PM +0000, adam etienne wrote: > > I have some trouble updating a table like this one : > date | data_raw | data_sys > 12-01 | 5 | 4.5 > 13-01 | 6 | 6 > 14-01 | 7 | 8 > > I would like to update the 'data_sys' row by computing values of multiple > 'data_raw' values. I mean for example : > data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] + > data_raw['14-01'] )/3; Is there a reason to maintain data_sys in the table? Could you use a view instead? A view could do self joins (join the table against itself) and perform the necessary calculations on the fly when you make a query. That could be expensive if you select large data sets, but for small slices of data it might suffice. Another possibility might be to use a trigger to recalculate data_sys when records are inserted, updated, or deleted. You'd still have to do a potentially expensive one-time update of the entire table, but future updates would then touch only the rows that depend on the data being inserted, updated, or deleted, and the calculated values would always be current. Using a trigger would require some care, however, to avoid cascading updates that are unnecessary or that could result in infinite recursion. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Thanks for your answer In fact the computation is somewhat more complex than an average and the data set is quite large... I did some test with view & triggers but it's too slow.. Moreover, sometime i need to do big insertion or update and then other time i need juste little update of this table... I would like to apply a trigger only for little update but i don't know how to proceed.. Maybe with a condition into the trigger.. But it's adding computation time... Thanks again, Etienne Adam > > I have some trouble updating a table like this one : > > date | data_raw | data_sys > > 12-01 | 5 | 4.5 > > 13-01 | 6 | 6 > > 14-01 | 7 | 8 > > > > I would like to update the 'data_sys' row by computing values of multiple > > 'data_raw' values. I mean for example : > > data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] + > > data_raw['14-01'] )/3; > >Is there a reason to maintain data_sys in the table? Could you use >a view instead? A view could do self joins (join the table against >itself) and perform the necessary calculations on the fly when you >make a query. That could be expensive if you select large data >sets, but for small slices of data it might suffice. > >Another possibility might be to use a trigger to recalculate data_sys >when records are inserted, updated, or deleted. You'd still have >to do a potentially expensive one-time update of the entire table, >but future updates would then touch only the rows that depend on >the data being inserted, updated, or deleted, and the calculated >values would always be current. Using a trigger would require some >care, however, to avoid cascading updates that are unnecessary or >that could result in infinite recursion. _________________________________________________________________ 無料メールならやっぱり 「MSN Hotmail」 http://www.hotmail.com/
On Sun, Jan 23, 2005 at 11:36:11AM +0000, adam etienne wrote: > In fact the computation is somewhat more complex than an average and the > data set is quite large... I did some test with view & triggers but it's > too slow.. Can you provide any more detail about the algorithm and the number of rows that you might have to insert or update? How did the test triggers work? If you used row-level triggers and if a trigger on one row updated multiple rows, then you might have been updating rows more times than necessary (once by explicit update and one or more unnecessary times by triggers on other rows). -- Michael Fuhr http://www.fuhr.org/~mfuhr/
I understand data_sys is the average value for the 3 days, from at the day before to the day after. This should do what you want, in one pass. Check the average function in the subselect. If what you want is to divide by 3 no matter how many records where found, enable the commented line. UPDATE mytable SET data_sys=TMP.average FROM ( --get the averages by date SELECT MT1.date AS date, avg(MT2.data_raw) AS average --sum(MT2.data_raw)/3 AS average FROM mytable MT1 INNER JOIN mytable MT2 ON (MT2.date BETWEEN MT1.date-1 AND MT1.date+1) GROUP BY MT1.date ) AS TMP WHERE mytable.date=TMP.date Hope this is what you where looking for. adam etienne wrote: > hi > I have some trouble updating a table like this one : > date | data_raw | data_sys > 12-01 | 5 | 4.5 > 13-01 | 6 | 6 > 14-01 | 7 | 8 > > I would like to update the 'data_sys' row by computing values of > multiple 'data_raw' values. I mean for example : > data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] + > data_raw['14-01'] )/3; > > I thought of a function that fetch the 3 data_raw rows for each > rows.... but it was obviously too much slow... > > Is there a more efficient way to achieve this ? > Thanks in advance.. This could help me very much.. > > Etienne Adam > > _________________________________________________________________ > 無料メールならやっぱり 「MSN Hotmail」 http://www.hotmail.com/ > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Thanks a lot.. That is what i searched.. In fact your query is very good for little changes, but i will have to use anothermethod when updating all my rows because the performance is not very good alas. My data set contains something like 40000 rows to update in 1+ million records and data_raw, data_sys are of type "real"...The complete update took 40 minutes on a 256Mo, athlon 2400, kernel 2.6 and with no charge during the executionof the query. Is this normal ? The number of columns of the table does it matter a lot (the table contains 12 reals and 4 integers) ? I found that using an intermediate table which stock for every row the value before and the value after helps to gain speed...But it is not a very nice way i think.. Thanks again :) Etienne
updating 40.000 records should take no longer than a couple of minutes. I think you should optimise your query before going any further. You have an inner SELECT sentence that executes before anything. It joins EVERY row in your table (1,000,000+) with at most 3 other rows in the same table, so you will end up with about 3,000,000+ rows... but you are interested in only 40,000 rows! To make it simple, add a WHERE condition to fetch only the 40.000 rows you are interested in and discard the others. Make sure also you have indexed the attributes you are filtering on, and the date attribute too. You should use EXPLAIN ANALYZE on the inner query to check how it improves. Once your SELECT query runs fast enough, the UPDATE should go much faster too. The number of columns matters, but as I said, I don't think it's an UPDATE problem. If you don't find the way to speed your query up, try posting to the performance list. mrblonde@locked.myftp.org wrote: >Thanks a lot.. That is what i searched.. In fact your query is very good for little changes, but i will have to use anothermethod when updating all my rows because the performance is not very good alas. > >My data set contains something like 40000 rows to update in 1+ million records and data_raw, data_sys are of type "real"...The complete update took 40 minutes on a 256Mo, athlon 2400, kernel 2.6 and with no charge during the executionof the query. > >Is this normal ? The number of columns of the table does it matter a lot (the table contains 12 reals and 4 integers) ? > >I found that using an intermediate table which stock for every row the value before and the value after helps to gain speed...But it is not a very nice way i think.. > >Thanks again :) >Etienne > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > >