Thread: Btree index extension question

Btree index extension question

From

Dmitry Tkach

Date:

15 March 2002, 11:52:05

Hi, everybody!

I was wonderring if there is somebody out there who could help me with 
understand how index extensions work...
Let me state the problem first.

I have many (15) boolean attributes and I need to be able to search the 
database for entries with any combination of those attributes for being 
true. For example - find all the entries, where a1=a2=a3=true or find 
all the entries where a1=a2=a4=true etc...
Because there are so many of them (and the database is HUGE), putting 
every attribute into a separate column and creating a separate index on 
every possible combination, is really out of the question.
So, I was thinking about creating a single int2 column, with each bit 
representing an attribute - so that, the first query I quoted above 
would look like "select * from table where attributes & 7 = 7", and the 
other query would be
"select * from table where attributes & 11 = 11' etc...

This looked so beautiful to me, but now I am stuck trying to index that 
table  [:-(]

I started off, hoping to get away with btrees.

I defined an operator >>=(int2,int2) as 'select $1&$2=$2;'
It looks nice so far, but then the question is - what do I do with the 
other operations? By analogy with 'normal' comparison operators, I would do: 

>> (I know the name is taken  [:-)]  as 'select not $2 >>= $1'
=<<                                  as 'select $2 >>= $1'
<<                                   as 'select not $1 >>= $2'
... and leave '=' intact.

But then I realized, that these set of operators, does not really define 
a complete order - for example, if I compare, say, 5 and 3:
5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the 
same time  [:-(]

So my question is, first of all, is that a problem? Does btree require a 
complete order defined? Will it work with partial order?
Secondly, if it is a problem, perhaps, I am missing something here, 
assuming that there is no way to define a set of operations to do what I 
want and provide a completely ordered set (or do I need it to define a 
perfect complete order - what exactly is required for btree to work? Any 
ideas?)

And finally, if there is just no way I could get away with btrees, can I 
make an rtree to work for me? Could somebody explain to me (or point me 
to a doc somewhere) the meaning of the strategies (and requirements - 
like transitivity etc...) I need for an rtree, and also what support 
functions (like comparison func in case of a btree) do I need?

Thank you very much for your attention.
Any input will be greatly appreciated.

Dima

Re: Btree index extension question

From

Tom Lane

Date:

15 March 2002, 12:07:12

Dmitry Tkach <dmitry@openratings.com> writes:
> Does btree require a complete order defined?

Yes, absolutely.  I do not think that btree is a useful index type for
your problem.  Nor rtree (our rtree implementation can only do 2-D
AFAIK).

You might have better luck with GIST.  In particular contrib/intarray
would be worth looking at for inspiration.  (You could even use it
directly if you cared to change your data representation, I think.)
        regards, tom lane

Re: Btree index extension question

From

Dmitry Tkach

Date:

15 March 2002, 13:03:31

Tom Lane wrote:

>  Nor rtree (our rtree implementation can only do 2-D
>AFAIK).
>
I know that... But, I thought, if I knew the set of requirements for the 
rtree strategies and support funcs., I could come up with a set of 
functions that would simulate the 2-D metrics, yet do what I need... 
(ALL that I need really is to test if a & b = b - I just can't believe 
there is no simple way to do that!).

>
>
>You might have better luck with GIST.  In particular contrib/intarray
>would be worth looking at for inspiration.  
>
Yeah... I looked at it... It scared the hell out of me - seemed to be 
too damn complicated :-)
I mean, I just can't affort spending months on this thing - I have a 
huge project to work on, and this is just one , really small, part of it...
Perhaps, if I it was documented a little better, I would be able to make 
some sense of it in a reasonable time, but just going through all that 
code and trying to guess what it is used for, doesn't look like an 
option to me (especially, provided that I am not familiar with postgres 
sources it is using, so, it is really a lot more stuff for me to go 
through) :-(

>(You could even use it
>directly if you cared to change your data representation, I think.)
>
You mean to use an array of 15 ints instead of a single bit-packed 
representation?
Well... I said before - the table is HUGE (60 million of rows), and 
these 15 attributes is not the only stuff in it... So, my immediate 
concern about changing the representation is if I am going to be able to 
afford so much more disk space...

And then, there are other issues... I wanted it to be a btree, to begin 
with, because these are not the only 15 attributes I need to use in the 
search criteria - so, I was hoping to use a multicolumn btree, with my 
bitset just being one of many columns in it... and if I opt for a gist 
(or even an rtree), that's not an option...

Ohh... tough luck - sounds like I am going to have to stick with a 
"straight" btree index, plus a lot of postprocessing afterwards :-(

Dima

Re: Btree index extension question

From

"Christopher Kings-Lynne"

Date:

16 March 2002, 22:58:00

Why don't you use the 'varbit' or 'bit' type?

Chris

----- Original Message -----
From: "Dmitry Tkach" <dmitry@openratings.com>
To: <pgsql-sql@postgresql.org>
Sent: Saturday, March 16, 2002 2:20 AM
Subject: [SQL] Btree index extension question


> Hi, everybody!
>
> I was wonderring if there is somebody out there who could help me with
> understand how index extensions work...
> Let me state the problem first.
>
> I have many (15) boolean attributes and I need to be able to search the
> database for entries with any combination of those attributes for being
> true. For example - find all the entries, where a1=a2=a3=true or find
> all the entries where a1=a2=a4=true etc...
> Because there are so many of them (and the database is HUGE), putting
> every attribute into a separate column and creating a separate index on
> every possible combination, is really out of the question.
> So, I was thinking about creating a single int2 column, with each bit
> representing an attribute - so that, the first query I quoted above
> would look like "select * from table where attributes & 7 = 7", and the
> other query would be
> "select * from table where attributes & 11 = 11' etc...
>
> This looked so beautiful to me, but now I am stuck trying to index that
> table  [:-(]
>
> I started off, hoping to get away with btrees.
>
> I defined an operator >>=(int2,int2) as 'select $1&$2=$2;'
> It looks nice so far, but then the question is - what do I do with the
> other operations? By analogy with 'normal' comparison operators, I would
do:
>
>
>  >> (I know the name is taken  [:-)]  as 'select not $2 >>= $1'
> =<<                                  as 'select $2 >>= $1'
> <<                                   as 'select not $1 >>= $2'
> ... and leave '=' intact.
>
> But then I realized, that these set of operators, does not really define
> a complete order - for example, if I compare, say, 5 and 3:
> 5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the
> same time  [:-(]
>
> So my question is, first of all, is that a problem? Does btree require a
> complete order defined? Will it work with partial order?
> Secondly, if it is a problem, perhaps, I am missing something here,
> assuming that there is no way to define a set of operations to do what I
> want and provide a completely ordered set (or do I need it to define a
> perfect complete order - what exactly is required for btree to work? Any
> ideas?)
>
> And finally, if there is just no way I could get away with btrees, can I
> make an rtree to work for me? Could somebody explain to me (or point me
> to a doc somewhere) the meaning of the strategies (and requirements -
> like transitivity etc...) I need for an rtree, and also what support
> functions (like comparison func in case of a btree) do I need?
>
> Thank you very much for your attention.
> Any input will be greatly appreciated.
>
> Dima
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Re: Btree index extension question

From

Dima Tkach

Date:

16 March 2002, 23:03:36

Christopher Kings-Lynne wrote:

>Why don't you use the 'varbit' or 'bit' type?
>
Because it doesn't solve anything - there is no way to create an index 
with that type to do the sort of queries I need :-(