Thread: update from multiple rows

update from multiple rows

From
"adam etienne"
Date:
hi I have some trouble updating a table like this one : date     |     data_raw     |     data_sys 12-01   |   5
          |       4.5 13-01   |   6                   |       6 14-01   |   7                   |       8
 

I would like to update the 'data_sys' row by computing values of multiple 
'data_raw' values. I mean for example :data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] + 
data_raw['14-01'] )/3;

I thought of a function that fetch the 3 data_raw rows for each rows.... 
but it was obviously too much slow...

Is there a more efficient way to achieve this ?
Thanks in advance.. This could help me very much..

Etienne Adam

_________________________________________________________________
無料メールならやっぱり 「MSN Hotmail」 http://www.hotmail.com/ 



Re: update from multiple rows

From
Michael Fuhr
Date:
On Sat, Jan 22, 2005 at 12:51:20PM +0000, adam etienne wrote:
>
>  I have some trouble updating a table like this one :
>  date     |     data_raw     |     data_sys
>  12-01   |   5                   |       4.5
>  13-01   |   6                   |       6
>  14-01   |   7                   |       8
> 
> I would like to update the 'data_sys' row by computing values of multiple 
> 'data_raw' values. I mean for example :
> data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] + 
> data_raw['14-01'] )/3;

Is there a reason to maintain data_sys in the table?  Could you use
a view instead?  A view could do self joins (join the table against
itself) and perform the necessary calculations on the fly when you
make a query.  That could be expensive if you select large data
sets, but for small slices of data it might suffice.

Another possibility might be to use a trigger to recalculate data_sys
when records are inserted, updated, or deleted.  You'd still have
to do a potentially expensive one-time update of the entire table,
but future updates would then touch only the rows that depend on
the data being inserted, updated, or deleted, and the calculated
values would always be current.  Using a trigger would require some
care, however, to avoid cascading updates that are unnecessary or
that could result in infinite recursion.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/


Re: update from multiple rows

From
"adam etienne"
Date:
Thanks for your answer
In fact the computation is somewhat more complex than an average and the 
data set is quite large... I did some test with view & triggers but it's 
too slow..
Moreover, sometime i need to do big insertion or update and then other time 
i need juste little update of this table...
I would like to apply a trigger only for little update but i don't know how 
to proceed.. Maybe with a condition into the trigger.. But it's adding 
computation time...

Thanks again,
Etienne Adam



> >  I have some trouble updating a table like this one :
> >  date     |     data_raw     |     data_sys
> >  12-01   |   5                   |       4.5
> >  13-01   |   6                   |       6
> >  14-01   |   7                   |       8
> >
> > I would like to update the 'data_sys' row by computing values of 
multiple
> > 'data_raw' values. I mean for example :
> > data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] +
> > data_raw['14-01'] )/3;
>
>Is there a reason to maintain data_sys in the table?  Could you use
>a view instead?  A view could do self joins (join the table against
>itself) and perform the necessary calculations on the fly when you
>make a query.  That could be expensive if you select large data
>sets, but for small slices of data it might suffice.
>
>Another possibility might be to use a trigger to recalculate data_sys
>when records are inserted, updated, or deleted.  You'd still have
>to do a potentially expensive one-time update of the entire table,
>but future updates would then touch only the rows that depend on
>the data being inserted, updated, or deleted, and the calculated
>values would always be current.  Using a trigger would require some
>care, however, to avoid cascading updates that are unnecessary or
>that could result in infinite recursion.

_________________________________________________________________
無料メールならやっぱり 「MSN Hotmail」 http://www.hotmail.com/ 



Re: update from multiple rows

From
Michael Fuhr
Date:
On Sun, Jan 23, 2005 at 11:36:11AM +0000, adam etienne wrote:

> In fact the computation is somewhat more complex than an average and the 
> data set is quite large... I did some test with view & triggers but it's 
> too slow..

Can you provide any more detail about the algorithm and the number
of rows that you might have to insert or update?  How did the test
triggers work?  If you used row-level triggers and if a trigger on
one row updated multiple rows, then you might have been updating
rows more times than necessary (once by explicit update and one or
more unnecessary times by triggers on other rows).

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/


Re: update from multiple rows

From
franco
Date:
I understand data_sys is the average value for the 3 days, from at the
day before to the day after.
This should do what you want, in one pass. Check the average function in
the subselect. If what you want is to divide by 3 no matter how many
records where found, enable the commented line.

UPDATE mytable SET data_sys=TMP.average
FROM (
--get the averages by date
SELECT
MT1.date AS date,
avg(MT2.data_raw) AS average
--sum(MT2.data_raw)/3 AS average
FROM
mytable MT1
INNER JOIN mytable MT2 ON (MT2.date BETWEEN MT1.date-1 AND MT1.date+1)
GROUP BY
MT1.date
) AS TMP
WHERE
mytable.date=TMP.date

Hope this is what you where looking for.

adam etienne wrote:

> hi
> I have some trouble updating a table like this one :
> date | data_raw | data_sys
> 12-01 | 5 | 4.5
> 13-01 | 6 | 6
> 14-01 | 7 | 8
>
> I would like to update the 'data_sys' row by computing values of
> multiple 'data_raw' values. I mean for example :
> data_sys(13-01) = (data_raw['12-01'] + data_raw['13-01'] +
> data_raw['14-01'] )/3;
>
> I thought of a function that fetch the 3 data_raw rows for each
> rows.... but it was obviously too much slow...
>
> Is there a more efficient way to achieve this ?
> Thanks in advance.. This could help me very much..
>
> Etienne Adam
>
> _________________________________________________________________
> 無料メールならやっぱり 「MSN Hotmail」 http://www.hotmail.com/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>



Re: update from multiple rows

From
mrblonde@locked.myftp.org
Date:
Thanks a lot.. That is what i searched.. In fact your query is very good for little changes, but i will have to use
anothermethod when updating all my rows because the performance is not very good alas.
 

My data set contains something like 40000 rows to update in 1+ million records and data_raw, data_sys are of type
"real"...The complete update took 40 minutes on a 256Mo, athlon 2400, kernel 2.6 and with no charge during the
executionof the query.
 

Is this normal ? The number of columns of the table does it matter a lot (the table contains 12 reals and 4 integers)
?

I found that using an intermediate table which stock for every row the value before and the value after helps to gain
speed...But it is not a very nice way i think..
 

Thanks again :)
Etienne 



Re: update from multiple rows

From
Franco Bruno Borghesi
Date:
updating 40.000 records should take no longer than a couple of minutes. 
I think you should optimise your query before going any further.

You have an inner SELECT sentence that executes before anything. It 
joins EVERY row in your table (1,000,000+) with at most 3 other rows in 
the same table, so you will end up with about 3,000,000+ rows... but you 
are interested in only 40,000 rows!

To make it simple, add a WHERE condition to fetch only the 40.000 rows 
you are interested in and discard the others. Make sure also you have 
indexed the attributes you are filtering on, and the date attribute too.
You should use EXPLAIN ANALYZE on the inner query to check how it improves.

Once your SELECT query runs fast enough, the UPDATE should go much 
faster too.

The number of columns matters, but as I said, I don't think it's an 
UPDATE problem.

If you don't find the way to speed your query up, try posting to the 
performance list.

mrblonde@locked.myftp.org wrote:

>Thanks a lot.. That is what i searched.. In fact your query is very good for little changes, but i will have to use
anothermethod when updating all my rows because the performance is not very good alas.
 
>
>My data set contains something like 40000 rows to update in 1+ million records and data_raw, data_sys are of type
"real"...The complete update took 40 minutes on a 256Mo, athlon 2400, kernel 2.6 and with no charge during the
executionof the query.
 
>
>Is this normal ? The number of columns of the table does it matter a lot (the table contains 12 reals and 4 integers)
?
>
>I found that using an intermediate table which stock for every row the value before and the value after helps to gain
speed...But it is not a very nice way i think..
 
>
>Thanks again :)
>Etienne 
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>  
>