Joel Dudley wrote:
> I am about to write a set of C functions to be used in an aggregate
> function in which the final function performs a calculation on an array
> of accumulated text data types stored in a text[] array. I need to use
> the text type because this function will be used on DNA sequences which
> can be very large. My questions are the following. What is the most
> efficient way to accumulate a text array while being efficient with
> memory? I see construct_array() used in accumulation functions but I am
> worried that I might end up making a copy of a potentially very large
> text array each time my accumulation function is called.
True, but the intermediate results should be released after each row, I
think. You might try it with some real data before assuming a
performance problem.
If it is a problem, take a look at how contrib/intagg works. It
basically just passes a pointer from call to call. You could do
something similar for the text data type.
> The general flow is
>
> User defined aggregate function
> SELECT pb_distance_k2p(sequence) WHERE family_id = 10;
>
> uses accumulation function
>
> distance_accum(PG_FUNCTION_ARGS);
>
> and uses a final function
>
> calculate_distance_k2p(PG_FUNCTION_ARGS)
>
> which needs to deconstruct_array() to get the text array and loop
> through the array to do some pairwise comparisons of the text and return
> a multidimensional array
Makes sense to me. BTW, take a look at PL/R
http://www.joeconway.com/plr/
It would allow you to write your final function in R, which has many
extensions related to bioinformatics -- see:
http://www.bioconductor.org/
HTH,
Joe