Thread: speeding up CUBE queries

speeding up CUBE queries

From

Rajarshi Guha

Date:

08 October 2007, 23:53:22

Hi, I'm running Postgres 8.2.4 and have installed the CUBE extension.
Using this I had a 10M row table populated with 12-dimensional zero-
volume cubes (i.e., 12D points).

My queries are of the form

select * from ctab where '(x1,x2,x3,...,x12),
(y1,y2,y3,...,y12)'::cube @> cubeField;

So essentially I am asking for all rows that lie within the specified
bounding box. Now the cubeField column has a GiST index on it. As a
result in a number of cases I can get results in less than a minute
(and in a few cases under 15 sec).

Now 1 minute is relatively long, but it's acceptable (but faster is
always better). However there are some instances when a query takes 4
to 5 minutes. This is problematic, but I'm not sure I see a solution.

One thing that I did observe is that the very long (4-minute) queries
occur when the bounding box is very densely filled with points (based
on knowledge of the dataset). Very fast queries occur when the
bounding box is quite sparsely filled. Now it is also true that the
12D space is not uniformly populated, so that probably has an effect.

I have altered the statistics count on the cubeField column to 100 so
that vacuum analyze performs larger sampling, but that doesn't seem
to help the timings for the more extreme queries.

Could anybody point me to a strategy to improve performance of the
index on this dataset? I have done some searching wrt GIS databases
and I found a post (http://postgis.refractions.net/pipermail/postgis-
users/2002-October/001526.html) which talked about tuning a spatial
index - but that's for 2D and it's not clear whether those ideas were
implemented (and if so, whether they could be applied to my problem)

Are there any options to improving performance on the 12D case? Or am
I stuck?

Thanks,

-------------------------------------------------------------------
Rajarshi Guha  <rguha@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
Accuracy, n.:
    The vice of being right

Re: speeding up CUBE queries

From

Oleg Bartunov

Date:

09 October 2007, 01:15:58

I'm not sure, but probably mtree will be better ?

On Mon, 8 Oct 2007, Rajarshi Guha wrote:

> Hi, I'm running Postgres 8.2.4 and have installed the CUBE extension. Using
> this I had a 10M row table populated with 12-dimensional zero-volume cubes
> (i.e., 12D points).
>
> My queries are of the form
>
> select * from ctab where '(x1,x2,x3,...,x12), (y1,y2,y3,...,y12)'::cube @>
> cubeField;
>
> So essentially I am asking for all rows that lie within the specified
> bounding box. Now the cubeField column has a GiST index on it. As a result in
> a number of cases I can get results in less than a minute (and in a few cases
> under 15 sec).
>
> Now 1 minute is relatively long, but it's acceptable (but faster is always
> better). However there are some instances when a query takes 4 to 5 minutes.
> This is problematic, but I'm not sure I see a solution.
>
> One thing that I did observe is that the very long (4-minute) queries occur
> when the bounding box is very densely filled with points (based on knowledge
> of the dataset). Very fast queries occur when the bounding box is quite
> sparsely filled. Now it is also true that the 12D space is not uniformly
> populated, so that probably has an effect.
>
> I have altered the statistics count on the cubeField column to 100 so that
> vacuum analyze performs larger sampling, but that doesn't seem to help the
> timings for the more extreme queries.
>
> Could anybody point me to a strategy to improve performance of the index on
> this dataset? I have done some searching wrt GIS databases and I found a post
> (http://postgis.refractions.net/pipermail/postgis-users/2002-October/001526.html)
> which talked about tuning a spatial index - but that's for 2D and it's not
> clear whether those ideas were implemented (and if so, whether they could be
> applied to my problem)
>
> Are there any options to improving performance on the 12D case? Or am I
> stuck?
>
> Thanks,
>
> -------------------------------------------------------------------
> Rajarshi Guha  <rguha@indiana.edu>
> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
> -------------------------------------------------------------------
> Accuracy, n.:
>     The vice of being right
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: speeding up CUBE queries

From

Rajarshi Guha

Date:

09 October 2007, 02:15:56

On Oct 8, 2007, at 9:15 PM, Oleg Bartunov wrote:

> I'm not sure, but probably mtree will be better ?

Thanks - do you have any pointers to this? Google only turns up an
mtree utility that seems to be related to file system operations

-------------------------------------------------------------------
Rajarshi Guha  <rguha@indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
Disembowelling takes guts.