Thread: Sum of multiplied deltas

Sum of multiplied deltas

From
Gerhard Wiesinger
Date:
Hello!

I've the following data:
datetime | val1 | val2
time1    |    4 | 40%
time2    |    7 | 30%
time3    |   12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


Re: Sum of multiplied deltas

From
David Fetter
Date:
On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:
> Hello!
>
> I've the following data:
> datetime | val1 | val2
> time1    |    4 | 40%
> time2    |    7 | 30%
> time3    |   12 | 20%
> ...
>
> I'd like to sum up the following:
>
> (7-4)*30% + (12-7)*20% + ...

This is best done in 8.4 using Windowing.  Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.

Cheers,
David.
>
> datetime is ordered (and unique and has also an id).
>
> Rows are in the area of millions.
>
> How is it done best?
> 1.) Self join with one row shift?
> 2.) function?
>
> Any hint?
>
> Thnx.
>
> Ciao,
> Gerhard
>
> --
> http://www.wiesinger.com/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Sum of multiplied deltas

From
Gerhard Wiesinger
Date:
Any hints for an 8.3 environment (currently)?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Mon, 8 Jun 2009, David Fetter wrote:

> On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:
>> Hello!
>>
>> I've the following data:
>> datetime | val1 | val2
>> time1    |    4 | 40%
>> time2    |    7 | 30%
>> time3    |   12 | 20%
>> ...
>>
>> I'd like to sum up the following:
>>
>> (7-4)*30% + (12-7)*20% + ...
>
> This is best done in 8.4 using Windowing.  Sadly, it's an 8.4-only
> feature, and dodgy hacks are the rule until you can use them.
>
> Cheers,
> David.
>>
>> datetime is ordered (and unique and has also an id).
>>
>> Rows are in the area of millions.
>>
>> How is it done best?
>> 1.) Self join with one row shift?
>> 2.) function?
>>
>> Any hint?
>>
>> Thnx.
>>
>> Ciao,
>> Gerhard
>>
>> --
>> http://www.wiesinger.com/
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Sum of multiplied deltas

From
Tom Lane
Date:
Gerhard Wiesinger <lists@wiesinger.com> writes:
> Any hints for an 8.3 environment (currently)?

Use a FOR loop in plpgsql and remember the previous row's value in a
variable.

            regards, tom lane

Re: Sum of multiplied deltas

From
Oliver Kohll - Mailing Lists
Date:
On 8 Jun 2009, at 19:01, David Fetter wrote:

>> Hello!
>>
>> I've the following data:
>> datetime | val1 | val2
>> time1    |    4 | 40%
>> time2    |    7 | 30%
>> time3    |   12 | 20%
>> ...
>>
>> I'd like to sum up the following:
>>
>> (7-4)*30% + (12-7)*20% + ...
>
> This is best done in 8.4 using Windowing.  Sadly, it's an 8.4-only
> feature, and dodgy hacks are the rule until you can use them.
>
> Cheers,
> David.
>


I do this type of thing now and again using a self join with an offset.

select test_a.val1 - test_b.val1
from test test_a inner join test test_b
on test_a.pkey = test_b.pkey - 1;

Thought I was quite clever the first time, didn't know it was a dodgy
hack! I'm trying to learn more about windowing before 8.4, how would
this example be done with that?

Regards
Oliver

Re: Sum of multiplied deltas

From
"Daniel Verite"
Date:
    Gerhard Wiesinger wrote:

> I've the following data:
> datetime | val1 | val2
> time1    |    4 | 40%
> time2    |    7 | 30%
> time3    |   12 | 20%
> ...
>
> I'd like to sum up the following:
>
> (7-4)*30% + (12-7)*20% + ...
>
> datetime is ordered (and unique and has also an id).

> 1.) Self join with one row shift?

Self-join only helps if the id comes from a gap-less sequence. Row
numbers could be used if available, but they are not in 8.3.
A possible way of solving this (when a procedural method is not wanted)
is to lay out the dataset in temporary arrays that are repeated for
every row you need to compute. That can be arranged in a self-contained
sql query, like this:

select sum((av1[i]-av1[i-1])*av2[i]) from
 (select av1,av2,generate_series(2,array_upper(av1,1)) as i from
  (select array_accum(val1) as av1, array_accum(val2) as av2 from
    (select val1,val2 from TABLENAME order by datetime) s0
  ) s1
 ) s2

However, this would probably be too slow for a large dataset.

Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
 http://www.manitou-mail.org

Re: Sum of multiplied deltas

From
"Marc Mamin"
Date:

Hello,

I've found a dirty hack with custom GUC variables here:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php

Although dirty, it seems that it beats the windowing performances of
8.4.

So I wonder if there are any concern about this...

Cheers,

Marc Mamin




Re: Sum of multiplied deltas

From
Martin Gainty
Date:
Marc

very concerned about the 'dirty' classification

any suggestions?
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
 
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.





> Subject: Re: [GENERAL] Sum of multiplied deltas
> Date: Tue, 9 Jun 2009 13:46:03 +0200
> From: M.Mamin@intershop.de
> To: oliver.lists@gtwm.co.uk; pgsql-general@postgresql.org
>
>
>
> Hello,
>
> I've found a dirty hack with custom GUC variables here:
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00643.php
>
> Although dirty, it seems that it beats the windowing performances of
> 8.4.
>
> So I wonder if there are any concern about this...
>
> Cheers,
>
> Marc Mamin
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Windows Live™: Keep your life in sync. Check it out.

fulltext search udf

From
Andreas Wenk
Date:
Hi everybody,

altough I know that the way over a Trigger is preferred, I can't see what is wrong in my
function. It must be an escaping error:

CREATE OR REPLACE FUNCTION
user_insert_vector_data(text,text,text,numeric,numeric,numeric,text)
         RETURNS void AS
$$
DECLARE
         art_nr ALIAS for $1;
         bezeichnung ALIAS FOR $2;
         beschreibung ALIAS FOR $3;
         preis ALIAS FOR $4;
         steuersatz ALIAS FOR $5;
         aktionspreis ALIAS FOR $6;
         stichworte ALIAS for $7;
    vect tsvector;
BEGIN
    vect :=    setweight(to_tsvector('german',coalesce(stichworte,'')),'A') ||
        setweight(to_tsvector('german',coalesce(beschreibung,'')),'B');

         EXECUTE 'INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
                 VALUES
('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
                    '||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';

END;
$$ LANGUAGE plpgsql;

Firering this statement:

select user_insert_vector_data('adfvb','adfvb','adfvb',15.5,25.50,2,'alpha,beta,cesar');

results in this error:

FEHLER:  Syntaxfehler bei »:«
LINE 3: ...                 25.50,2,'alpha,beta,cesar','beta':2A 'adfvb...
                                                              ^
QUERY:  INSERT INTO produkte
(art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
                 VALUES ('adfvb','adfvb','adfvb',15.5,
                    25.50,2,'alpha,beta,cesar','beta':2A 'adfvb':4B 'alpha':1A 'cesar':3A)
CONTEXT:  PL/pgSQL function "user_insert_vector_data" line 14 at EXECUTE statement


vector is of datatype tsvector. So I don't understand why pg is complaining ...

Thank's for any hints and sorry for annoying if I am too blind ...

Cheers

Andy

----------------------------------
Netzmeister St.Pauli

St.Pauli - Hamburg - Germany

Andreas Wenk


Re: fulltext search udf

From
Tom Lane
Date:
Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:
>          EXECUTE 'INSERT INTO produkte
> (art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
>                  VALUES
> ('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
>                     '||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';

This is not going to work with such a half-baked approach to quoting the
data values --- any value with ' or \ in it will break it.  You could
use quote_literal(), but I wonder why you are bothering with EXECUTE at
all.  If you just did the INSERT directly it would run faster and be a
lot less ugly.

            regards, tom lane

Re: fulltext search udf

From
Andreas Wenk
Date:
Tom Lane wrote:
> Andreas Wenk <a.wenk@netzmeister-st-pauli.de> writes:
>>          EXECUTE 'INSERT INTO produkte
>> (art_nr,bezeichnung,beschreibung,preis,steuersatz,aktionspreis,stichworte,vector)
>>                  VALUES
>> ('''||art_nr||''','''||bezeichnung||''','''||beschreibung||''','||preis||',
>>                     '||steuersatz||','||aktionspreis||','''||stichworte||''','||vect||')';
>
> This is not going to work with such a half-baked approach to quoting the
> data values --- any value with ' or \ in it will break it.  You could
> use quote_literal(), but I wonder why you are bothering with EXECUTE at
> all.  If you just did the INSERT directly it would run faster and be a
> lot less ugly.
>
>             regards, tom lane
Tom, thanks for the hint! I was really doing the wrong thing. There have
been even more errors what made me confused. Now I got it ...

Cheers

Andy

Re: Sum of multiplied deltas

From
Gerhard Wiesinger
Date:
Hello,

Finally I used a function below which works well. Only one problem is
left: It polutes the buffer cache because of the cursor. Any idea to get
rid of this behavior?

BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an
examples how this could work. Any further comments how to implement it?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

CREATE TYPE PS AS
(
   sum_m1 double precision,
   sum_m2 double precision
);

DROP FUNCTION getSum();
CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time
zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$
DECLARE
   curs CURSOR FOR
   SELECT
     *
   FROM
     log_entries
   WHERE
     datetime >= start_ts
     AND datetime <= stop_ts
   ORDER BY
     datetime
   ;
   row log_entries%ROWTYPE;
   i bigint = 0;
   datetime_old timestamp with time zone;
   old double precision;
   sum_m1 double precision = 0;
   sum_m2 double precision = 0;
   psum PS;
BEGIN
   OPEN curs;
   LOOP
     FETCH curs INTO row;
     EXIT WHEN NOT FOUND;
     IF row.col IS NOT NULL THEN
       IF i > 0 THEN
         sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2;
         sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * row.col3;
       END IF;
       i = i + 1;
       old = row.old;
       datetime_old = row.datetime;
     END IF;
   END LOOP;
   CLOSE curs;
   psum.sum_m1 = sum_m1;
   psum.sum_m2 = sum_m2;
   RETURN psum;
END;
$$ LANGUAGE plpgsql;


On Mon, 8 Jun 2009, Gerhard Wiesinger wrote:

> Hello!
>
> I've the following data:
> datetime | val1 | val2
> time1    |    4 | 40%
> time2    |    7 | 30%
> time3    |   12 | 20%
> ...
>
> I'd like to sum up the following:
>
> (7-4)*30% + (12-7)*20% + ...
>
> datetime is ordered (and unique and has also an id).
>
> Rows are in the area of millions.
>
> How is it done best?
> 1.) Self join with one row shift?
> 2.) function?
>
> Any hint?
>
> Thnx.
>
> Ciao,
> Gerhard
>
> --
> http://www.wiesinger.com/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>