Thread: arrays of floating point numbers / linear algebra operations into the DB

arrays of floating point numbers / linear algebra operations into the DB

From

Enrico Sirola

Date:

01 February 2008, 06:31:50

Hello,
I'd like to perform linear algebra operations on float4/8 arrays.
These tasks are tipically carried on using ad hoc optimized libraries
(e.g. BLAS). In order to do this, I studied a bit how arrays are
stored internally by the DB: from what I understood, arrays are
basically a vector of Datum, and floating point numbers are stored by
reference into Datums. At a first glance, this seem to close the
discussion because in order to perform fast linear algebra operations,
you need to store array items in consecutive memory cells.
What are the alternatives? Create a new specialized data type for
floating point vectors?
Basically, the use-case is to be able to rescale, add and multiply
(element-by-element)
vectors.

Thanks for your help,
e.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Colin Wetherbee

Date:

01 February 2008, 10:23:15

Enrico Sirola wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays. These
> tasks are tipically carried on using ad hoc optimized libraries (e.g.
> BLAS). In order to do this, I studied a bit how arrays are stored
> internally by the DB: from what I understood, arrays are basically a
> vector of Datum, and floating point numbers are stored by reference into
> Datums. At a first glance, this seem to close the discussion because in
> order to perform fast linear algebra operations, you need to store array
> items in consecutive memory cells.
> What are the alternatives? Create a new specialized data type for
> floating point vectors?
> Basically, the use-case is to be able to rescale, add and multiply
> (element-by-element)
> vectors.

I'm not sure about the internals of PostgreSQL (eg. the Datum object(?)
you mention), but if you're just scaling vectors, consecutive memory
addresses shouldn't be absolutely necessary.  Add and multiply
operations within a linked list (which is how I'm naively assuming Datum
storage for arrays in memory is implemented) will be "roughly" just as fast.

How many scaling operations are you planning to execute per second, and
how many elements do you scale per operation?

Colin

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Martijn van Oosterhout

Date:

01 February 2008, 10:54:12

On Fri, Feb 01, 2008 at 11:31:37AM +0100, Enrico Sirola wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays.
> These tasks are tipically carried on using ad hoc optimized libraries
> (e.g. BLAS). In order to do this, I studied a bit how arrays are
> stored internally by the DB: from what I understood, arrays are
> basically a vector of Datum, and floating point numbers are stored by
> reference into Datums.

Well, arrays are not vectors of Datum, they are a vector of the objects
they contain. When passed to a function floats, arrays and other by-ref
types as passed by reference, but the array object itself does not
contain references, it contains the actual objects.

That doesn't necessarily make it the same as a C array though, the
alignment considerations may be different. But at first glance
certainly seems like an array would be in the right format for what
you're doing.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

signature.asc

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Enrico Sirola

Date:

01 February 2008, 13:07:37

Hi Colin,

Il giorno 01/feb/08, alle ore 15:22, Colin Wetherbee ha scritto:
>
> I'm not sure about the internals of PostgreSQL (eg. the Datum
> object(?) you mention), but if you're just scaling vectors,
> consecutive memory addresses shouldn't be absolutely necessary.  Add
> and multiply operations within a linked list (which is how I'm
> naively assuming Datum storage for arrays in memory is implemented)
> will be "roughly" just as fast.
>

I'm not an expert, anyway the SSE instructions family should make the
difference when performing this kind of workload, and those
instructions work on consecutive memory cells.

> How many scaling operations are you planning to execute per second,
> and how many elements do you scale per operation?

typically, arrays contain 1000 elements, and an operation is either
multiply it by a scalar or multiply it element-by-element with another
array. The time to rescale 1000 arrays, multiply it for another array
and at the end sum all the 1000 resulting arrays should be enough to
be carried on in an interactive application (let's say 0.5s). This, in
the case when no disk-access is required. Disk access will obviously
downgrade performances a bit ad the beginning, but the workload is
mostly read-only so after a while the whole table will be cached
anyway. The table containing the arrays would be truncated/repopulated
every day and the number of arrays is expected to be more or less
150000 (at least this is what we have now). Nowadays, we have a c++
middleware between the calculations and an aggressive caching of the
table contents (and we don't use arrays, just a row per element) but
the application could be refactored (and simplified a lot) if we have
a smart way to save data into the DB.
Bye,
e.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

"Webb Sprague"

Date:

01 February 2008, 16:08:36

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays.
> These tasks are tipically carried on using ad hoc optimized libraries
> (e.g. BLAS).

If there were a coherently designed, simple, and fast LAPACK/ MATLAB
style library and set of  datatypes for matrices and vectors in
Postgres, I think that would be a HUGE plus for the project!

I would have used it on a project I am working on in mortality
forecasting (I would have been able to put all of my mathematics in
the database instead of using scipy), it would tie in beautifully with
the GIS and imagery efforts, it would ease fancy statistics
calculation  on database infrastructure, it would provide useful
libraries for the datamining/ knowledge discovery types, etc, etc.  If
we just had fast matrix arithmetic, eigen-stuff (including singular
value decomposition),  convolution, random matrix generation, and
table <-> matrix functions,  that would be amazing and would provide
the material for further library development since a lot of complex
algorithms just fall out when you can do advanced linear algebra.

We need to be able to  convert transparently between matrices/ vectors
(which I think should be simple N by 1 matrices by default) and
arrays, but we would probably want to go for a separate datatype in
order to get speed since scientifically important matrices can be
HUGE.

Just my fairly worthless $0.02, as I all I would provide would be to
be a tester and member of the  peanut-gallery, but there you go.
Seems like a perfect Summer Of Code project for someone better at
C-level programming than me.

-W

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Ron Mayer

Date:

01 February 2008, 16:19:29

Webb Sprague wrote:
> On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
>> I'd like to perform linear algebra operations on float4/8 arrays...
>
> If there were a coherently designed, simple, and fast LAPACK/ MATLAB
> style library and set of  datatypes for matrices and vectors in
> Postgres, I think that would be a HUGE plus for the project!

I'd also be very excited about this project.

Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors.

I assume something like "within a n-dimensional bounding box"
would be possible with GIST....

I'd be eager to help, test, debug, etc; but probably aren't qualified
to take the lead on such a project.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

"Webb Sprague"

Date:

01 February 2008, 16:46:27

(I had meant also to add that a linear algebra package would help
Postgres to be the mediator for  real-time data, from things like
temprature sensors, etc, and their relationship to  not-so-scientific
data, say in a manufacturing environment).

On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
> Webb Sprague wrote:
> > On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> >> I'd like to perform linear algebra operations on float4/8 arrays...
> >
> > If there were a coherently designed, simple, and fast LAPACK/ MATLAB
> > style library and set of  datatypes for matrices and vectors in
> > Postgres, I think that would be a HUGE plus for the project!
>
> I'd also be very excited about this project.
>
> Especially if some GIST or similar index could efficiently search
> for vectors "close" to other vectors.

That would be very interesting as we could play with a multitude of
different distance metrics from Analysis!!!  Wow!

> I'd be eager to help, test, debug, etc; but probably aren't qualified
> to take the lead on such a project.

I almost think the hardest part would be to spec it out and design the
interface to the libraries.  Once we had that, the libraries are
already there, though figuring out how we are going to handle gigabyte
size elements (e.g. a satellite image) will require some finesse, and
perhaps some tiling ...

Hmm.  If I get some more interest on this list (I need just one LAPACK
/ BLAS hacker...), I will apply for a pgFoundry project and appoint
myself head of the peanut gallery...

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Joe Conway

Date:

01 February 2008, 16:51:15

Enrico Sirola wrote:
> typically, arrays contain 1000 elements, and an operation is either
> multiply it by a scalar or multiply it element-by-element with another
> array. The time to rescale 1000 arrays, multiply it for another array
> and at the end sum all the 1000 resulting arrays should be enough to be
> carried on in an interactive application (let's say 0.5s). This, in the
> case when no disk-access is required. Disk access will obviously
> downgrade performances a bit ad the beginning, but the workload is
> mostly read-only so after a while the whole table will be cached anyway.
> The table containing the arrays would be truncated/repopulated every day
> and the number of arrays is expected to be more or less 150000 (at least
> this is what we have now). Nowadays, we have a c++ middleware between
> the calculations and an aggressive caching of the table contents (and we
> don't use arrays, just a row per element) but the application could be
> refactored (and simplified a lot) if we have a smart way to save data
> into the DB.

I don't know if the speed will meet your needs, but you might test to
see if PL/R will work for you:

   http://www.joeconway.com/plr/

You could use pg.spi.exec() from within the R procedure to grab the
arrays, do all of your processing inside R (which uses whatever BLAS
you've set it up to use), and then return the result out to Postgres.

Joe

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Enrico Sirola

Date:

01 February 2008, 17:21:25

Hi Joe,

> I don't know if the speed will meet your needs, but you might test
> to see if PL/R will work for you:
>
>  http://www.joeconway.com/plr/
>
> You could use pg.spi.exec() from within the R procedure to grab the
> arrays, do all of your processing inside R (which uses whatever BLAS
> you've set it up to use), and then return the result out to Postgres.

Thanks a lot for the hint, I'll give it a try. It also will be much
easier to implement a prototype :-)
Bye,
e.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Ron Mayer

Date:

01 February 2008, 19:35:28

Webb Sprague wrote:
> On Feb 1, 2008 12:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
>> Webb Sprague wrote:
>>> On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
>>>> ...linear algebra ...
>>> ... matrices and vectors .
>> ...Especially if some GIST or similar index could efficiently search
>> for vectors "close" to other vectors...
>
> Hmm.  If I get some more interest on this list (I need just one LAPACK
> / BLAS hacker...), I will apply for a pgFoundry project and appoint
> myself head of the peanut gallery...

I think you should start one.  I'd be happy to help.

I'm rather proficient in C; somewhat literate about postgres' GIST
stuff (I think a couple of my bugfix patches were accepted in postgis);
and deal with a big database doing lots of similarity-based searches (a
6'2" guy with light brown hair being similar to a 6'1" guy with dark
blond hair) - and am experimenting with modeling some of the data as
vectors in postgres.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

"Webb Sprague"

Date:

01 February 2008, 21:18:13

> >>>> ...linear algebra ...
> >>> ... matrices and vectors .
> >> ...Especially if some GIST or similar index could efficiently search
> >> for vectors "close" to other vectors...
> >
> > Hmm.  If I get some more interest on this list (I need just one LAPACK
> > / BLAS hacker...), I will apply for a pgFoundry project and appoint
> > myself head of the peanut gallery...
>
> I think you should start one.  I'd be happy to help.

OK.  You are on.  I think designing an interface is the first step,
and I am inclined to use matlab syntax plus cool things I wish they
had (convolution matrices, recycling, etc).

> I'm rather proficient in C; somewhat literate about postgres' GIST
> stuff (I think a couple of my bugfix patches were accepted in postgis);

Nifty!  I am having trouble bending my head around how we can fit 10K
by 10K matrices into Datums, but if you have worked with PostGIS then
a lot of those big geographic fields might help.

> and deal with a big database doing lots of similarity-based searches (a
> 6'2" guy with light brown hair being similar to a 6'1" guy with dark
> blond hair) - and am experimenting with modeling some of the data as
> vectors in postgres.

Well,  I bet a good linear algebra library would help.  A lot. :)

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Ted Byers

Date:

02 February 2008, 01:47:50

--- Webb Sprague <webb.sprague@gmail.com> wrote:
> > >>>> ...linear algebra ...
> > >>> ... matrices and vectors .
> > >> ...Especially if some GIST or similar index
> could efficiently search
> > >> for vectors "close" to other vectors...
> > >
I see a potential problem here, in terms of how one
defines "close" or similitude.  I think, though,
practical answers can be found in examples of applying
quantitative methods in some subdisciplines of
biology.

> > > Hmm.  If I get some more interest on this list
> (I need just one LAPACK
> > > / BLAS hacker...), I will apply for a pgFoundry
> project and appoint
> > > myself head of the peanut gallery...
> >
Someone pointed to the potential utility of pl/R.  I
would be interested at least in learning about your
assessment of the two (postgis and pl/r.  Alas, I
don't have decent date I could use to experiment with
either (except possibly for time series analysis,
which is a completely different kettle of fish.
>
> > and deal with a big database doing lots of
> similarity-based searches (a
> > 6'2" guy with light brown hair being similar to a
> 6'1" guy with dark
> > blond hair) - and am experimenting with modeling
> some of the data as
> > vectors in postgres.
>
> Well,  I bet a good linear algebra library would
> help.  A lot. :)
>

If you're looking at similarity, and some practicality
in the USE of quantitative procedures, you may want to
look into the biogeography and numerical taxonomy
literature, and to a lesser extent quantitative plant
ecology.  All three subdisciplines of biology have
decades of experience, and copious literature, looking
at similarity measures, and in my experience much more
practical or pragmatic than the 'pure' biostatistics
literature, and infinitely more practical than any
theoretical statistical or mathematical literature I
have seen (trust me, I have a bookcase full of this
"stuff").

A good linear algebra library would be useful, but
there are a lot of nonlinear analyses that would be of
interest; and there are nonparametric, yet
quantitative approaches that are of considerable
interest in assessing similarity.

I don't know of work looking at applying things like
discriminant functions analysis or cluster analysis or
any of the many ordination analyses that may be
considered to searches in a database, but then I
haven't looked at the question since I graduated.  I
am interested in the question, though, and would be
interested in hearing about your experience on the
question.

If I can manage the time, I hope to start a project
where I can store description data for specimens of
plants and animals, use analyses including but not
limited to ordination, clustering, discriminant
functions, cannonical correlation, to create a
structure for comparing them, and for identifying new
specimens, or at a minimum, if the specimen is truly
something unknown, learn what known specimens or
groups thereof it is most similar to, and how it is
different.

I have managed to install pl/r, but I haven't had the
time to figure out how best to analyze data stored in
the database using it.  In the data I Do have, it
changes daily, and some of the tables are well over
100MB, so I am a bit worried about how well it can
handle such an amount of data, and how long it would
take.

Cheers,

Ted

Re: arrays of floating point numbers / linear algebra operations into the DB

From

"Webb Sprague"

Date:

02 February 2008, 02:12:01

On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> Hello,
> I'd like to perform linear algebra operations on float4/8 arrays

Having avoided a bunch of real work wondering about linear algebra and
PG, did you consider the Gnu Scientific Library ?  We would still need
to hook everything together, but it seems to do a lot of this, and is
written in C, etc.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Ron Mayer

Date:

02 February 2008, 03:46:14

Ted Byers wrote:
> --- Webb Sprague <webb.sprague@gmail.com> wrote:
>>>>>>> ...linear algebra ...
>>>>>> ... matrices and vectors .
>>>>> ...Especially if some GIST or similar index
>> could efficiently search
>>>>> for vectors "close" to other vectors...
>
> I see a potential problem here, in terms of how one
> defines "close" or similitude.  I think, though,
> practical answers can be found in examples of applying
> quantitative methods in some subdisciplines of
> biology.

Even if the best GIST can give is selecting vectors
constrained by a n-dimensional bounding box (in much
the same way it does for postgis) it can help a lot.

Then your application can select everything in a
conservatively large box, and use whatever it's
favorite metric is to narrow the data further.

> Someone pointed to the potential utility of pl/R.  I
> would be interested at least in learning about your
> assessment of the two (postgis and pl/r.

I think they'd be complimentary.

IMHO if a native postgresql datatype could allow indexes
to narrow the amount of data that needs to be processed;
it'd be great to do the rest of the work using R (though
we're perhaps foolishly using something else in our
application).

> If you're looking at similarity, and some practicality
> in the USE of quantitative procedures, you may want to
> look into the biogeography and numerical taxonomy
> literature, and to a lesser extent quantitative plant
> ecology.

Indeed.  Though the current literature I wade through
is crime analysis.   Ideally our software tries to
match witness descriptions of "black SUV" as substantially
similar to "dark green 4runner" - especially when seen
at night; and understand that a "Glock 19" is a
"9mm handgun" and that a "9mm handgun" might be but
isn't necessarily a "Glock 19".  Also - two person
records with a tattoo of a cross might contribute a little
similarity -- but two records with tattoos of Darth
Maul / Maori face art contribute much more to the
similarity scores because they're so much more distinctive.

And of course there are entire companies focused on similarity
metrics for fingerprints, DNA, and names (Bill is arguably more
similar to William than to Bell).

Any magic indexes to help such queries would be
very cool; but so far we do most of it in our
application logic.

> A good linear algebra library would be useful, but
> there are a lot of nonlinear analyses that would be of
> interest; and there are nonparametric, yet
> quantitative approaches that are of considerable
> interest in assessing similarity.

True, many things don't map cleanly to linear algebra;
but they would be quite useful.

> If I can manage the time, I hope to start a project
> where I can store description data for specimens of
> plants and animals, use analyses including but not
> limited to ordination, clustering, discriminant
> functions, cannonical correlation, to create a
> structure for comparing them, and for identifying new
> specimens, or at a minimum, if the specimen is truly
> something unknown, learn what known specimens or
> groups thereof it is most similar to, and how it is
> different.

Very cool.  Sounds like a somewhat similar issue to mine.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Enrico Sirola

Date:

02 February 2008, 13:39:50

Hi Webb, Joe, Martijn

Webb Sprague ha scritto:
> On Feb 1, 2008 2:31 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
>> Hello,
>> I'd like to perform linear algebra operations on float4/8 arrays
>
> Having avoided a bunch of real work wondering about linear algebra and
> PG, did you consider the Gnu Scientific Library ?  We would still need
> to hook everything together, but it seems to do a lot of this, and is
> written in C, etc.

I experimented a bit today with cblas, and wrapped the blas function for
scaling a vector. The following session shows the usage:

create or replace function scale(float8, float8[])
returns float8[]
as '$libdir/linalg', 'scale'
language 'C' immutable strict;

sps_dev=# select scale(k, '{1,2,3}') from generate_series(1,10) k;
scale
------------
  {1,2,3}
  {2,4,6}
  {3,6,9}
  {4,8,12}
  {5,10,15}
  {6,12,18}
  {7,14,21}
  {8,16,24}
  {9,18,27}
  {10,20,30}
(10 rows)

sps_dev=# create operator * (leftarg=float8, rightarg=float8[],
procedure=scale);

sps_dev=# select k * '{1,2,3}'::float8[] from generate_series(1,10) k;
?column?
------------
  {1,2,3}
  {2,4,6}
  {3,6,9}
  {4,8,12}
  {5,10,15}
  {6,12,18}
  {7,14,21}
  {8,16,24}
  {9,18,27}
  {10,20,30}
(10 rows)

I'm quite proud, this is my first C extension function ;-)
I'd gladly post the code if it's ok for the list users. It's more or
less 100 lines of code. This approach seems promising...

By the way, Webb: I took a look at GSL and it seems to me that, from a
linear algebra point of view, it's basically cblas, so I'd use cblas
directly.
Please let me know your thoughts/advices,
e.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

"Webb Sprague"

Date:

02 February 2008, 17:22:51

> I'm quite proud, this is my first C extension function ;-)
> I'd gladly post the code if it's ok for the list users. It's more or
> less 100 lines of code. This approach seems promising...

I would definitely like to see it.

> By the way, Webb: I took a look at GSL and it seems to me that, from a
> linear algebra point of view, it's basically cblas, so I'd use cblas
> directly.
> Please let me know your thoughts/advices,

The only thing about GSL is that it would make it easier to tie into
some very sophisticated stuff later, and (I think) the basic linear
algebra is probably just as fast as CBLAS, and we could implement it
first.  It would also be easy to define a big project as : "bring GSL
to Postgres", and then people could work on pieces.  But if you
actually write it, you get to decide :)

GSL licensing is GNU ish, so may be that is a deal breaker, too.

w
> e.
>
>

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Enrico Sirola

Date:

03 February 2008, 10:28:44

Hi Webb,

Webb Sprague ha scritto:
>> I'm quite proud, this is my first C extension function ;-)
>> I'd gladly post the code if it's ok for the list users. It's more or
>> less 100 lines of code. This approach seems promising...
>
> I would definitely like to see it.

here it goes:

-------------------------->linalg.h<----------------------------------

#ifndef linalg_h
#define linalg_h

#include "postgres.h"
#include "utils/array.h"

Datum scale(PG_FUNCTION_ARGS);

#endif /* linalg_h */

-------------------------->linalg.c<----------------------------------


#include "linalg.h"
#include "fmgr.h"
#include <cblas.h>

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1( scale);


Datum
scale(PG_FUNCTION_ARGS)
{
   float8    x = PG_GETARG_FLOAT8(0);
   ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(1);
   int           *dims1,
     *lbs1,
     ndims1,
     nitems1,
     ndatabytes1;
   int *arrlbound1, *arrlbound2;
   char *arrdatap1, *arrdatap2;

   ArrayType *rv;

   /* get argument array details */
   lbs1 = ARR_LBOUND(v1);
   dims1 = ARR_DIMS(v1);
   ndims1 = v1->ndim;
   nitems1 = ArrayGetNItems(ndims1, dims1);
   ndatabytes1 = ARR_SIZE(v1) - ARR_DATA_OFFSET(v1);

   if ( ndims1 != 1 )
     ereport(ERROR,
             (errcode(ERRCODE_DATATYPE_MISMATCH),
              errmsg("Multi dimensional array given"),
              errdetail("Array have %d dimensions", ndims1)));

   if (ARR_HASNULL(v1))
     ereport(ERROR,
             (errcode(ERRCODE_DATATYPE_MISMATCH),
              errmsg("Null in array"),
              errdetail("array should not contain null elements")));

   /* allcate new vector */

   rv = (ArrayType *) palloc(ndatabytes1);

   SET_VARSIZE(rv, ndatabytes1);
   rv->ndim = v1->ndim;
   rv->dataoffset = v1->dataoffset; // no nulls (0)
   rv->elemtype = v1->elemtype;
   memcpy(ARR_DIMS(rv), ARR_DIMS(v1), sizeof(int));

   arrlbound2 = ARR_LBOUND(rv);
   arrlbound1 = ARR_LBOUND(v1);

   memcpy(arrlbound2, arrlbound1, sizeof(int));

   arrdatap1 = ARR_DATA_PTR(v1);
   arrdatap2 = ARR_DATA_PTR(rv);

   memcpy(arrdatap2, arrdatap1, nitems1 * sizeof(float8));

   /* scale vector a la blas */
   cblas_dscal(nitems1, x, (float8*) arrdatap2, 1);

   PG_RETURN_ARRAYTYPE_P(rv);
}

--------------------------->linalg.sql<---------------------------------

/* -*-sql-*- */
create or replace function scale(float8, float8[])
returns float8[]
as '$libdir/linalg', 'scale'
language 'C' immutable strict;

create aggregate array_accum (
     sfunc = array_append,
     basetype = anyelement,
     stype = anyarray,
     initcond = '{}'
);

create operator * (leftarg=float8, rightarg=float8[], procedure=scale);

-------------------------------->end<------------------------------------

> GSL licensing is GNU ish, so may be that is a deal breaker, too.

well, I don't know. This is just a proof of concept. Anyway, yes, there
could be problems with GPL.

On the above code: coupling cblas functions with PG float8[] seems easy,
you just have to know which black-magic-macros to use in order to access
the data structure. It took me a while to figure out how it works (I'm
not actually sure I understood everything, but at least the above code
seems to work). The problem for the code above is that it doesn't work
for vectors longer than 1000 elements or so (try it with 2000 and it
doesn't work). I guess I should manage the "toasting" machinery in some
ways - any suggestion is appreciated
Bye,
e.

Re: arrays of floating point numbers / linear algebra operations into the DB

From

Enrico Sirola

Date:

03 February 2008, 12:35:48

I respond myself:

Enrico Sirola ha scritto:

[...]

> seems to work). The problem for the code above is that it doesn't work
> for vectors longer than 1000 elements or so (try it with 2000 and it
> doesn't work). I guess I should manage the "toasting" machinery in some
> ways - any suggestion is appreciated

wrong. it was just that I forgot to add ARR_OVERHEAD_NONULLS(ndims1) to
the mem allocation for rv:

rv = (ArrayType *) palloc(nbytes);

becomes

rv = (ArrayType *) palloc(nbytes) +  ARR_OVERHEAD_NONULLS(ndims1);

and now it seems to work :)