Re: Postgres array parser - Mailing list pgsql-performance

From Marc Mamin
Subject Re: Postgres array parser
Date
Msg-id C4DAC901169B624F933534A26ED7DF310861B23C@JENMAIL01.ad.intershop.net
Whole thread Raw
In response to Re: Postgres array parser  (Aleksej Trofimov <aleksej.trofimov@ruptela.lt>)
List pgsql-performance
> Yes, it would be great, but I haven't found such a function, which
> splits 2 dimensional array into rows =) Maybe we'll modify existing
> function, but unfortunately we have tried hstore type and function in
> postgres and we see a significant performance improvements. So we only
> need to convert existing data into hstore and I think this is a good
> solution.



I haven't tested hstore yet, but I would be interested to find out if it still better perform with custom "numeric"
aggregateson the hstore values.
 

I've made a short "proof of concept"  test with a custom key/value type to achieve such an aggregation.
Something like:


   SELECT x, distinct_sum( (currency,amount)::keyval ) overview  FROM ... GROUP BY x

   x currency     amount
   a      EUR       15.0
   a      EUR        5.0
   a      CHF        7.5
   b      USD       12.0
   =>

   x  overview
   -  --------
   a {(EUR,20.0), (CHF,7.5)}
   b {(USD,10.0)}


regards,

Marc Mamin

 
> On 12/14/2011 11:21 AM, Marc Mamin wrote:
> > Hello,
> >
> > For such cases (see below), it would be nice to have an unnest
> function that only affect the first array dimension.
> >
> > Something like
> >
> > unnest(ARRAY[[1,2],[2,3]], SLICE=1)
> > =>
> > unnest
> > ------
> > [1,2]
> > [2,3]
> >
> >
> > With this function, I imagine that following sql function
> > might beat the plpgsql FOREACH version.
> >
> >
> > CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[],
> in_input_nr numeric)
> >     RETURNS numeric AS
> > $BODY$
> >
> >    SELECT u[1][2]
> >    FROM unnest($1, SLICE =1) u
> >    WHERE u[1][1]=in_input_nr
> >    LIMIT 1;
> >
> > $BODY$
> >     LANGUAGE sql IMMUTABLE;
> >
> >
> >
> > best regards,
> >
> > Marc Mamin
> >
> >
> >> -----Original Message-----
> >> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-
> >> owner@postgresql.org] On Behalf Of Pavel Stehule
> >> Sent: Dienstag, 13. Dezember 2011 15:43
> >> To: Aleksej Trofimov
> >> Cc: pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Postgres array parser
> >>
> >> Hello
> >>
> >> 2011/12/13 Aleksej Trofimov<aleksej.trofimov@ruptela.lt>:
> >>> We have tried foreach syntax, but we have noticed performance
> >> degradation:
> >>> Function with for: 203ms
> >>> Function with foreach: ~250ms:
> >>>
> >>> there is functions code:
> >>> CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[],
> >> in_input_nr
> >>> numeric)
> >>>   RETURNS numeric AS
> >>> $BODY$
> >>> declare i numeric[];
> >>> BEGIN
> >>>         FOREACH i SLICE 1 IN ARRAY in_inputs
> >>>             LOOP
> >>>                  if i[1] = in_input_nr then
> >>>                     return i[2];
> >>>                  end if;
> >>>             END LOOP;
> >>>
> >>>     return null;
> >>> END;
> >>> $BODY$
> >>>   LANGUAGE plpgsql VOLATILE
> >>>   COST 100;
> >>>
> >>> CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[],
> >> in_input_nr
> >>> numeric)
> >>>   RETURNS numeric AS
> >>> $BODY$
> >>> declare
> >>>   size int;
> >>> BEGIN
> >>>   size = array_upper(in_inputs, 1);
> >>>     IF size IS NOT NULL THEN
> >>>
> >>>         FOR i IN 1 .. size LOOP
> >>>             if in_inputs[i][1] = in_input_nr then
> >>>                 return in_inputs[i][2];
> >>>             end if;
> >>>         END LOOP;
> >>>     END IF;
> >>>
> >>>     return null;
> >>> END;
> >>> $BODY$
> >>>   LANGUAGE plpgsql VOLATILE
> >>>   COST 100;
> >>>
> >>>
> >>> On 12/13/2011 04:02 PM, Pavel Stehule wrote:
> >>>> Hello
> >>>>
> >>>> do you know FOREACH IN ARRAY statement in 9.1
> >>>>
> >>>> this significantly accelerate iteration over array
> >>>>
> >>>>
> >>>> http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-
> foreach-
> >> in-array/
> >>>>
> >>>>
> >>>> 2011/12/13 Aleksej Trofimov<aleksej.trofimov@ruptela.lt>:
> >>>>> Hello, I wanted to ask according such a problem which we had
> faced
> >> with.
> >>>>> We are widely using postgres arrays like key->value array by
> doing
> >> like
> >>>>> this:
> >>>>>
> >>>>> {{1,5},{2,6},{3,7}}
> >>>>>
> >>>>> where 1,2,3 are keys, and 5,6,7 are values. In our pgSql
> functions
> >> we are
> >>>>> using self written array_input(array::numeric[], key::numeric)
> >> function
> >>>>> which makes a loop on whole array and searches for key like
> >>>>> FOR i IN 1 .. size LOOP
> >>>>>             if array[i][1] = key then
> >>>>>                 return array[i][2];
> >>>>>             end if;
> >>>>> END LOOP;
> >>>>>
> >>>>> But this was a good solution until our arrays and database had
> >> grown. So
> >>>>> now
> >>>>> FOR loop takes a lot of time to find value of an array.
> >>>>>
> >>>>> And my question is, how this problem of performance could be
> >> solved? We
> >>>>> had
> >>>>> tried pgperl for string parsing, but it takes much more time than
> >> our
> >>>>> current solution. Also we are thinking about self-written C++
> >> function,
> >>>>> may
> >>>>> be someone had implemented this algorithm before?
> >>>>>
> >>>> you can use indexes or you can use hstore
> >>>>
> >>>> Regards
> >>>>
> >>>> Pavel Stehule
> >>>>
> >>>>> --
> >>>>> Best regards
> >>>>>
> >>>>> Aleksej Trofimov
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Sent via pgsql-performance mailing list
> >>>>> (pgsql-performance@postgresql.org)
> >>>>> To make changes to your subscription:
> >>>>> http://www.postgresql.org/mailpref/pgsql-performance
> >>>
> >> It is strange - on my comp FOREACH is about 2x faster
> >>
> >> postgres=# select input_value(array(select
> >> generate_series(1,1000000)::numeric), 100000);
> >>   input_value
> >> -------------
> >>
> >> (1 row)
> >>
> >> Time: 495.426 ms
> >>
> >> postgres=# select input_value_fe(array(select
> >> generate_series(1,1000000)::numeric), 100000);
> >>   input_value_fe
> >> ----------------
> >>
> >> (1 row)
> >>
> >> Time: 248.980 ms
> >>
> >> Regards
> >>
> >> Pavel
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-
> >> performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 
> --
> Best regards
> 
> Aleksej Trofimov
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: Aleksej Trofimov
Date:
Subject: Re: Postgres array parser
Next
From: Kevin Martyn
Date:
Subject: Re: copy vs. C function