Thread: Sum of multiplied deltas
Hello! I've the following data: datetime | val1 | val2 time1 | 4 | 40% time2 | 7 | 30% time3 | 12 | 20% ... I'd like to sum up the following: (7-4)*30% + (12-7)*20% + ... datetime is ordered (and unique and has also an id). Rows are in the area of millions. How is it done best? 1.) Self join with one row shift? 2.) function? Any hint? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote: > Hello! > > I've the following data: > datetime | val1 | val2 > time1 | 4 | 40% > time2 | 7 | 30% > time3 | 12 | 20% > ... > > I'd like to sum up the following: > > (7-4)*30% + (12-7)*20% + ... This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only feature, and dodgy hacks are the rule until you can use them. Cheers, David. > > datetime is ordered (and unique and has also an id). > > Rows are in the area of millions. > > How is it done best? > 1.) Self join with one row shift? > 2.) function? > > Any hint? > > Thnx. > > Ciao, > Gerhard > > -- > http://www.wiesinger.com/ > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Any hints for an 8.3 environment (currently)? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Mon, 8 Jun 2009, David Fetter wrote: > On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote: >> Hello! >> >> I've the following data: >> datetime | val1 | val2 >> time1 | 4 | 40% >> time2 | 7 | 30% >> time3 | 12 | 20% >> ... >> >> I'd like to sum up the following: >> >> (7-4)*30% + (12-7)*20% + ... > > This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only > feature, and dodgy hacks are the rule until you can use them. > > Cheers, > David. >> >> datetime is ordered (and unique and has also an id). >> >> Rows are in the area of millions. >> >> How is it done best? >> 1.) Self join with one row shift? >> 2.) function? >> >> Any hint? >> >> Thnx. >> >> Ciao, >> Gerhard >> >> -- >> http://www.wiesinger.com/ >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Gerhard Wiesinger <lists@wiesinger.com> writes: > Any hints for an 8.3 environment (currently)? Use a FOR loop in plpgsql and remember the previous row's value in a variable. regards, tom lane
On 8 Jun 2009, at 19:01, David Fetter wrote: >> Hello! >> >> I've the following data: >> datetime | val1 | val2 >> time1 | 4 | 40% >> time2 | 7 | 30% >> time3 | 12 | 20% >> ... >> >> I'd like to sum up the following: >> >> (7-4)*30% + (12-7)*20% + ... > > This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only > feature, and dodgy hacks are the rule until you can use them. > > Cheers, > David. > I do this type of thing now and again using a self join with an offset. select test_a.val1 - test_b.val1 from test test_a inner join test test_b on test_a.pkey = test_b.pkey - 1; Thought I was quite clever the first time, didn't know it was a dodgy hack! I'm trying to learn more about windowing before 8.4, how would this example be done with that? Regards Oliver
Gerhard Wiesinger wrote: > I've the following data: > datetime | val1 | val2 > time1 | 4 | 40% > time2 | 7 | 30% > time3 | 12 | 20% > ... > > I'd like to sum up the following: > > (7-4)*30% + (12-7)*20% + ... > > datetime is ordered (and unique and has also an id). > 1.) Self join with one row shift? Self-join only helps if the id comes from a gap-less sequence. Row numbers could be used if available, but they are not in 8.3. A possible way of solving this (when a procedural method is not wanted) is to lay out the dataset in temporary arrays that are repeated for every row you need to compute. That can be arranged in a self-contained sql query, like this: select sum((av1[i]-av1[i-1])*av2[i]) from (select av1,av2,generate_series(2,array_upper(av1,1)) as i from (select array_accum(val1) as av1, array_accum(val2) as av2 from (select val1,val2 from TABLENAME order by datetime) s0 ) s1 ) s2 However, this would probably be too slow for a large dataset. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Hello, I've found a dirty hack with custom GUC variables here: http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php Although dirty, it seems that it beats the windowing performances of 8.4. So I wonder if there are any concern about this... Cheers, Marc Mamin
Marc
very concerned about the 'dirty' classification
any suggestions?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
> Subject: Re: [GENERAL] Sum of multiplied deltas
> Date: Tue, 9 Jun 2009 13:46:03 +0200
> From: M.Mamin@intershop.de
> To: oliver.lists@gtwm.co.uk; pgsql-general@postgresql.org
>
>
>
> Hello,
>
> I've found a dirty hack with custom GUC variables here:
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php
>
> Although dirty, it seems that it beats the windowing performances of
> 8.4.
>
> So I wonder if there are any concern about this...
>
> Cheers,
>
> Marc Mamin
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Windows Live™: Keep your life in sync. Check it out.
very concerned about the 'dirty' classification
any suggestions?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.
> Subject: Re: [GENERAL] Sum of multiplied deltas
> Date: Tue, 9 Jun 2009 13:46:03 +0200
> From: M.Mamin@intershop.de
> To: oliver.lists@gtwm.co.uk; pgsql-general@postgresql.org
>
>
>
> Hello,
>
> I've found a dirty hack with custom GUC variables here:
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php
>
> Although dirty, it seems that it beats the windowing performances of
> 8.4.
>
> So I wonder if there are any concern about this...
>
> Cheers,
>
> Marc Mamin
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Windows Live™: Keep your life in sync. Check it out.
Hi everybody, altough I know that the way over a Trigger is preferred, I can't see what is wrong in my function. It must be an escaping error: CREATE OR REPLACE FUNCTION user_insert_vector_data(text,text,text,numeric,numeric,numeric,text) RETURNS void AS $$ DECLARE art_nr ALIAS for $1; bezeichnung ALIAS FOR $2; beschreibung ALIAS FOR $3; preis ALIAS FOR $4; steuersatz ALIAS FOR $5; aktionspreis ALIAS FOR $6; stichworte ALIAS for $7; vect tsvector; BEGIN vect := setweight(to_tsvector('german',coalesce(stichworte,'')),'A') || setweight(to_tsvector('german',coalesce(beschreibung,'')),'B'); EXECUTE 'INSERT INTO produkte (art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector) VALUES ('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||', '||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')'; END; $$ LANGUAGE plpgsql; Firering this statement: select user_insert_vector_data('adfvb','adfvb','adfvb',15.5,25.50,2,'alpha,beta,cesar'); results in this error: FEHLER: Syntaxfehler bei »:« LINE 3: ... 25.50,2,'alpha,beta,cesar','beta':2A 'adfvb... ^ QUERY: INSERT INTO produkte (art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector) VALUES ('adfvb','adfvb','adfvb',15.5, 25.50,2,'alpha,beta,cesar','beta':2A 'adfvb':4B 'alpha':1A 'cesar':3A) CONTEXT: PL/pgSQL function "user_insert_vector_data" line 14 at EXECUTE statement vector is of datatype tsvector. So I don't understand why pg is complaining ... Thank's for any hints and sorry for annoying if I am too blind ... Cheers Andy ---------------------------------- Netzmeister St.Pauli St.Pauli - Hamburg - Germany Andreas Wenk
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes: > EXECUTE 'INSERT INTO produkte > (art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector) > VALUES > ('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||', > '||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')'; This is not going to work with such a half-baked approach to quoting the data values --- any value with ' or \ in it will break it. You could use quote_literal(), but I wonder why you are bothering with EXECUTE at all. If you just did the INSERT directly it would run faster and be a lot less ugly. regards, tom lane
Tom Lane wrote: > Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes: >> EXECUTE 'INSERT INTO produkte >> (art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector) >> VALUES >> ('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||', >> '||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')'; > > This is not going to work with such a half-baked approach to quoting the > data values --- any value with ' or \ in it will break it. You could > use quote_literal(), but I wonder why you are bothering with EXECUTE at > all. If you just did the INSERT directly it would run faster and be a > lot less ugly. > > regards, tom lane Tom, thanks for the hint! I was really doing the wrong thing. There have been even more errors what made me confused. Now I got it ... Cheers Andy
Hello, Finally I used a function below which works well. Only one problem is left: It polutes the buffer cache because of the cursor. Any idea to get rid of this behavior? BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an examples how this could work. Any further comments how to implement it? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ CREATE TYPE PS AS ( sum_m1 double precision, sum_m2 double precision ); DROP FUNCTION getSum(); CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$ DECLARE curs CURSOR FOR SELECT * FROM log_entries WHERE datetime >= start_ts AND datetime <= stop_ts ORDER BY datetime ; row log_entries%ROWTYPE; i bigint = 0; datetime_old timestamp with time zone; old double precision; sum_m1 double precision = 0; sum_m2 double precision = 0; psum PS; BEGIN OPEN curs; LOOP FETCH curs INTO row; EXIT WHEN NOT FOUND; IF row.col IS NOT NULL THEN IF i > 0 THEN sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2; sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * row.col3; END IF; i = i + 1; old = row.old; datetime_old = row.datetime; END IF; END LOOP; CLOSE curs; psum.sum_m1 = sum_m1; psum.sum_m2 = sum_m2; RETURN psum; END; $$ LANGUAGE plpgsql; On Mon, 8 Jun 2009, Gerhard Wiesinger wrote: > Hello! > > I've the following data: > datetime | val1 | val2 > time1 | 4 | 40% > time2 | 7 | 30% > time3 | 12 | 20% > ... > > I'd like to sum up the following: > > (7-4)*30% + (12-7)*20% + ... > > datetime is ordered (and unique and has also an id). > > Rows are in the area of millions. > > How is it done best? > 1.) Self join with one row shift? > 2.) function? > > Any hint? > > Thnx. > > Ciao, > Gerhard > > -- > http://www.wiesinger.com/ > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >