Thread: cube operations

cube operations

From

ABHANG RANE

Date:

16 May 2007, 17:25:11

Hi,
I have a array column which has 12 real values in it. Basically these
values represent co-ordinates in 12 dimensions for a substance. My main
need is to find substances similar to a particular compound. Now I can
do by calculating differences with each array in the whole table. But
the table has millions of rows. So I need some kinda higher dimensional
index. I have read about the cube operation in postgre, can it be
extended to 12 dimensions or something like that.

Thanks
Abhang

Re: cube operations

From

"Filip Rembiałkowski"

Date:

16 May 2007, 18:45:32

2007/5/16, ABHANG RANE <arane@indiana.edu>:
> Hi,
> I have a array column which has 12 real values in it. Basically these
> values represent co-ordinates in 12 dimensions for a substance. My main
> need is to find substances similar to a particular compound. Now I can
> do by calculating differences with each array in the whole table. But
> the table has millions of rows. So I need some kinda higher dimensional
> index. I have read about the cube operation in postgre, can it be
> extended to 12 dimensions or something like that.

Don't know if this helps, but have a look at intarray:
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/intarray/
If you feel brave you could take this code and try to write some
proximity- or similarity-checking functions in C to speedup the
calculations.

Also consider representing values by integers, since integer
operations are much faster.


--
Filip Rembiałkowski

Re: cube operations

From

"John D. Burger"

Date:

16 May 2007, 23:14:32

ABHANG RANE wrote:

> I have a array column which has 12 real values in it. Basically
> these values represent co-ordinates in 12 dimensions for a
> substance. My main need is to find substances similar to a
> particular compound. Now I can do by calculating differences with
> each array in the whole table. But the table has millions of rows.
> So I need some kinda higher dimensional index.

Is there any particular reason you're using an array?  If every row
has all twelve values, I'd just make them columns.  Then I could use
a multi-column index.

> I have read about the cube operation in postgre, can it be extended
> to 12 dimensions or something like that.

I have no experience with CUBE, but I think it's just a kind of
summarization aggregate.

It sounds like you want the Nearest Neighbor(s) of your "particular
compound".  You might to read about that:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

- John Burger
   G63

Re: cube operations

From

Oleg Bartunov

Date:

17 May 2007, 02:15:29

hacking contrib/intarray could help you. You need to add function which
return the number of overlapped elements.

Oleg

On Wed, 16 May 2007, John D. Burger wrote:

> ABHANG RANE wrote:
>
>> I have a array column which has 12 real values in it. Basically these
>> values represent co-ordinates in 12 dimensions for a substance. My main
>> need is to find substances similar to a particular compound. Now I can do
>> by calculating differences with each array in the whole table. But the
>> table has millions of rows. So I need some kinda higher dimensional index.
>
> Is there any particular reason you're using an array?  If every row has all
> twelve values, I'd just make them columns.  Then I could use a multi-column
> index.
>
>> I have read about the cube operation in postgre, can it be extended to 12
>> dimensions or something like that.
>
> I have no experience with CUBE, but I think it's just a kind of summarization
> aggregate.
>
> It sounds like you want the Nearest Neighbor(s) of your "particular
> compound".  You might to read about that:
>
> http://en.wikipedia.org/wiki/Nearest_neighbor_search
>
> - John Burger
> G63
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>     subscribe-nomail command to majordomo@postgresql.org so that your
>     message can get through to the mailing list cleanly

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: cube operations

From

ABHANG RANE

Date:

17 May 2007, 15:06:06

Hi,
But now having 12 columns and multicolumn index, wont this slow down
the search process. I mean in general retrieving 12 columns using a
multicolumn index is slower or faster compared to an index on a 12 size
array?

Thanks
Abhang
Quoting "John D. Burger" <john@mitre.org>:

> ABHANG RANE wrote:
>
>> I have a array column which has 12 real values in it. Basically
>> these values represent co-ordinates in 12 dimensions for a
>> substance. My main need is to find substances similar to a
>> particular compound. Now I can do by calculating differences with
>> each array in the whole table. But the table has millions of rows.
>> So I need some kinda higher dimensional index.
>
> Is there any particular reason you're using an array?  If every row
> has all twelve values, I'd just make them columns.  Then I could use
> a multi-column index.
>
>> I have read about the cube operation in postgre, can it be extended
>> to 12 dimensions or something like that.
>
> I have no experience with CUBE, but I think it's just a kind of
> summarization aggregate.
>
> It sounds like you want the Nearest Neighbor(s) of your "particular
> compound".  You might to read about that:
>
> http://en.wikipedia.org/wiki/Nearest_neighbor_search
>
> - John Burger
>   G63
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>