Home > mailing lists

array_accum aggregate - Mailing list pgsql-hackers

From	Stephen Frost
Subject	array_accum aggregate
Date	October 6, 2006 20:34:59
Msg-id	20061006203453.GJ24675@kenobi.snowman.net Whole thread Raw
Responses	Re: array_accum aggregate Re: array_accum aggregate
List	pgsql-hackers

Tree view

Greetings,

  The array_accum example aggregate in the user documentation works
  reasonably on small data sets but doesn't work too hot on large ones.
  http://www.postgresql.org/docs/8.1/static/xaggr.html

  Normally I wouldn't care particularly much but it turns out that PL/R
  uses arrays for quite a bit (eg: histograms and other statistics
  functions).  I've also heard other complaints about the performance of
  arrays, though I'm not sure if those were due to array_accum or
  something else.

  Long story short, I set out to build a faster array_accum.  Much to my
  suprise and delight, we already *had* one.  accumArrayResult() and
  makeArrayResult()/construct_md_array() appear to do a fantastic job.
  I've created a couple of 'glue' functions to expose these functions so
  they can be used in an aggregate.  I'm sure they could be improved
  upon and possibly made even smaller than they already are (90 lines
  total for both) but I'd like to throw out the idea of including them
  in core.  The aggregate created with them could also be considered for
  inclusion though I'm less concerned with that.  I don't expect general
  PostgreSQL users would have trouble creating the aggregate- I don't
  know that the average user would be able or willing to write the C
  functions.

  For comparison, the new functions run with:
  time psql -c "select aaccum(generate_series) from generate_series(1,1000000);" > /dev/null
      4.24s real     0.34s user     0.06s system

  Compared to:
  time psql -c "select array_accum(generate_series) from generate_series(1,1000000);" > /dev/null
  ...

  Well, it's still running and it's been over an hour.

  The main differences, as I see it, are: accumArrayResult() works in
  chunks of 64 elements, and uses repalloc().  array_accum uses
  array_set() which works on individual elements and uses
  palloc()/memcpy().  I appriciate that this is done because for most
  cases of array_set() it's not acceptable to modify the input and am
  not suggesting that be changed.  An alternative might be to modify
  array_set() to check if it is in an aggregate and change its behavior
  but adding the seperate functions seemed cleaner and much less
  intrusive to me.

  Please find the functions attached.

          Thanks,

            Stephen

Attachment

aaccum.c

pgsql-hackers by date:

From: Tom Lane
Date: 06 October 2006, 20:10:58
Subject: Re: Should there be a charcat?

From: Peter Eisentraut
Date: 06 October 2006, 20:47:55
Subject: Re: 8.2 translation status?

array_accum aggregate - Mailing list pgsql-hackers

Attachment

Previous

Next