Thread: multi-column btree index for real values

multi-column btree index for real values

From
"Martin D. Weinberg"
Date:
Folks,

Can someone quickly describe how the btree is implemented for multiple
columns?  In particular, under what (if any) circumstances is there an
advantage if the index is over floating point values?

Thanks!

--Martin


Re: multi-column btree index for real values

From
Martijn van Oosterhout
Date:
On Thu, Oct 03, 2002 at 02:00:30PM -0400, Martin D. Weinberg wrote:
> Folks,
>
> Can someone quickly describe how the btree is implemented for multiple
> columns?  In particular, under what (if any) circumstances is there an
> advantage if the index is over floating point values?

AFAIK, multi-column btrees and simply handled by building a btree of the
first column. Each leaf contains a reference to another btree for the second
column, etc...

btrees are useful for < and > comparisons, meaning that queries saying WHERE
x BETWEEN 1.0 and 1.5 can use the index.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: multi-column btree index for real values

From
Martin Weinberg
Date:
Martijn,

Thanks.  So that implies that a multidimensional btree index is
useless for two columns of floats (one will probably always
be searching on the first index for a tree of large height).

Let me restate my question as an example.  Supose I have columns
of longitude and latitude.  What is the best indexing strategy to
find all tuples with in a two dimensional bound of longitude and
latitude.  E.g. with where clause

           lat between 21.49 and 37.41 and
               lon between 70.34 and 75.72
--Martin

Martijn van Oosterhout wrote on
Sun, 06 Oct 2002 00:02:58 +1000
>On Thu, Oct 03, 2002 at 02:00:30PM -0400, Martin D. Weinberg wrote:
>> Folks,
>>
>> Can someone quickly describe how the btree is implemented for multiple
>> columns?  In particular, under what (if any) circumstances is there an
>> advantage if the index is over floating point values?
>
>AFAIK, multi-column btrees and simply handled by building a btree of the
>first column. Each leaf contains a reference to another btree for the second
>column, etc...
>
>btrees are useful for < and > comparisons, meaning that queries saying WHERE
>x BETWEEN 1.0 and 1.5 can use the index.
>--
>Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
>> There are 10 kinds of people in the world, those that can do binary
>> arithmetic and those that can't.
>

Re: multi-column btree index for real values

From
Bruce Momjian
Date:
Martin Weinberg wrote:
> Martijn,
>
> Thanks.  So that implies that a multidimensional btree index is
> useless for two columns of floats (one will probably always
> be searching on the first index for a tree of large height).
>
> Let me restate my question as an example.  Supose I have columns
> of longitude and latitude.  What is the best indexing strategy to
> find all tuples with in a two dimensional bound of longitude and
> latitude.  E.g. with where clause
>
>            lat between 21.49 and 37.41 and

Oh, rtree.  That is exactly the index type you want.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: multi-column btree index for real values

From
Martin Weinberg
Date:
Thanks Bruce.  Some simple tests on a 10 million tuple data base shows
that r-tree works well for this. (It took me a while to realize
that I had to sort boxes of zero area rather than points).

However, it seems that the rtree index has a serious memory leak for
7.2.2.  Is that known?

--Martin

Bruce Momjian wrote on Sat, 05 Oct 2002 14:30:47 EDT
>Martin Weinberg wrote:
>> Martijn,
>>
>> Thanks.  So that implies that a multidimensional btree index is
>> useless for two columns of floats (one will probably always
>> be searching on the first index for a tree of large height).
>>
>> Let me restate my question as an example.  Supose I have columns
>> of longitude and latitude.  What is the best indexing strategy to
>> find all tuples with in a two dimensional bound of longitude and
>> latitude.  E.g. with where clause
>>
>>            lat between 21.49 and 37.41 and
>
>Oh, rtree.  That is exactly the index type you want.
>
>--
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: multi-column btree index for real values

From
Bruce Momjian
Date:
Martin Weinberg wrote:
> Thanks Bruce.  Some simple tests on a 10 million tuple data base shows
> that r-tree works well for this. (It took me a while to realize
> that I had to sort boxes of zero area rather than points).
>
> However, it seems that the rtree index has a serious memory leak for
> 7.2.2.  Is that known?

Uh, I am not aware of that, but you can try 7.3beta to see if it is
better, and if not, report back.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: multi-column btree index for real values

From
Martin Weinberg
Date:
Ok.  I don't have the disk space to migrate to 7.3 right now.
Our database is about 500 million tuples with about 100 fields.
However, I checked out the beta source and noticed the following new
lines near line 934 in src/backend/access/rtree/rtree.c in 7.2.2:

+                       pfree(DatumGetPointer(union_dl));
+                       pfree(DatumGetPointer(union_dr));

Looks like a leak fix to me.  Patching these in, I found that the leak
is much reduced; still looks like a tiny leak remains.  We will try
7.3beta asap.

Thanks.

Bruce Momjian wrote on Sat, 05 Oct 2002 17:39:45 EDT
>Martin Weinberg wrote:
>> Thanks Bruce.  Some simple tests on a 10 million tuple data base shows
>> that r-tree works well for this. (It took me a while to realize
>> that I had to sort boxes of zero area rather than points).
>>
>> However, it seems that the rtree index has a serious memory leak for
>> 7.2.2.  Is that known?
>
>Uh, I am not aware of that, but you can try 7.3beta to see if it is
>better, and if not, report back.
>
>--
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>

Re: multi-column btree index for real values

From
Oleg Bartunov
Date:
On Sat, 5 Oct 2002, Martin Weinberg wrote:

> Thanks Bruce.  Some simple tests on a 10 million tuple data base shows
> that r-tree works well for this. (It took me a while to realize
> that I had to sort boxes of zero area rather than points).
>
> However, it seems that the rtree index has a serious memory leak for
> 7.2.2.  Is that known?

probably, try contrib/rtree_gist

>
> --Martin
>
> Bruce Momjian wrote on Sat, 05 Oct 2002 14:30:47 EDT
> >Martin Weinberg wrote:
> >> Martijn,
> >>
> >> Thanks.  So that implies that a multidimensional btree index is
> >> useless for two columns of floats (one will probably always
> >> be searching on the first index for a tree of large height).
> >>
> >> Let me restate my question as an example.  Supose I have columns
> >> of longitude and latitude.  What is the best indexing strategy to
> >> find all tuples with in a two dimensional bound of longitude and
> >> latitude.  E.g. with where clause
> >>
> >>            lat between 21.49 and 37.41 and
> >
> >Oh, rtree.  That is exactly the index type you want.
> >
> >--
> >  Bruce Momjian                        |  http://candle.pha.pa.us
> >  pgman@candle.pha.pa.us               |  (610) 359-1001
> >  +  If your life is a hard drive,     |  13 Roberts Road
> >  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 2: you can get off all lists at once with the unregister command
> >    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83