Thread: Btree index extension question
Hi, everybody! I was wonderring if there is somebody out there who could help me with understand how index extensions work... Let me state the problem first. I have many (15) boolean attributes and I need to be able to search the database for entries with any combination of those attributes for being true. For example - find all the entries, where a1=a2=a3=true or find all the entries where a1=a2=a4=true etc... Because there are so many of them (and the database is HUGE), putting every attribute into a separate column and creating a separate index on every possible combination, is really out of the question. So, I was thinking about creating a single int2 column, with each bit representing an attribute - so that, the first query I quoted above would look like "select * from table where attributes & 7 = 7", and the other query would be "select * from table where attributes & 11 = 11' etc... This looked so beautiful to me, but now I am stuck trying to index that table [:-(] I started off, hoping to get away with btrees. I defined an operator >>=(int2,int2) as 'select $1&$2=$2;' It looks nice so far, but then the question is - what do I do with the other operations? By analogy with 'normal' comparison operators, I would do: >> (I know the name is taken [:-)] as 'select not $2 >>= $1' =<< as 'select $2 >>= $1' << as 'select not $1 >>= $2' ... and leave '=' intact. But then I realized, that these set of operators, does not really define a complete order - for example, if I compare, say, 5 and 3: 5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the same time [:-(] So my question is, first of all, is that a problem? Does btree require a complete order defined? Will it work with partial order? Secondly, if it is a problem, perhaps, I am missing something here, assuming that there is no way to define a set of operations to do what I want and provide a completely ordered set (or do I need it to define a perfect complete order - what exactly is required for btree to work? Any ideas?) And finally, if there is just no way I could get away with btrees, can I make an rtree to work for me? Could somebody explain to me (or point me to a doc somewhere) the meaning of the strategies (and requirements - like transitivity etc...) I need for an rtree, and also what support functions (like comparison func in case of a btree) do I need? Thank you very much for your attention. Any input will be greatly appreciated. Dima
Dmitry Tkach <dmitry@openratings.com> writes: > Does btree require a complete order defined? Yes, absolutely. I do not think that btree is a useful index type for your problem. Nor rtree (our rtree implementation can only do 2-D AFAIK). You might have better luck with GIST. In particular contrib/intarray would be worth looking at for inspiration. (You could even use it directly if you cared to change your data representation, I think.) regards, tom lane
Tom Lane wrote: > Nor rtree (our rtree implementation can only do 2-D >AFAIK). > I know that... But, I thought, if I knew the set of requirements for the rtree strategies and support funcs., I could come up with a set of functions that would simulate the 2-D metrics, yet do what I need... (ALL that I need really is to test if a & b = b - I just can't believe there is no simple way to do that!). > > >You might have better luck with GIST. In particular contrib/intarray >would be worth looking at for inspiration. > Yeah... I looked at it... It scared the hell out of me - seemed to be too damn complicated :-) I mean, I just can't affort spending months on this thing - I have a huge project to work on, and this is just one , really small, part of it... Perhaps, if I it was documented a little better, I would be able to make some sense of it in a reasonable time, but just going through all that code and trying to guess what it is used for, doesn't look like an option to me (especially, provided that I am not familiar with postgres sources it is using, so, it is really a lot more stuff for me to go through) :-( >(You could even use it >directly if you cared to change your data representation, I think.) > You mean to use an array of 15 ints instead of a single bit-packed representation? Well... I said before - the table is HUGE (60 million of rows), and these 15 attributes is not the only stuff in it... So, my immediate concern about changing the representation is if I am going to be able to afford so much more disk space... And then, there are other issues... I wanted it to be a btree, to begin with, because these are not the only 15 attributes I need to use in the search criteria - so, I was hoping to use a multicolumn btree, with my bitset just being one of many columns in it... and if I opt for a gist (or even an rtree), that's not an option... Ohh... tough luck - sounds like I am going to have to stick with a "straight" btree index, plus a lot of postprocessing afterwards :-( Dima
Why don't you use the 'varbit' or 'bit' type? Chris ----- Original Message ----- From: "Dmitry Tkach" <dmitry@openratings.com> To: <pgsql-sql@postgresql.org> Sent: Saturday, March 16, 2002 2:20 AM Subject: [SQL] Btree index extension question > Hi, everybody! > > I was wonderring if there is somebody out there who could help me with > understand how index extensions work... > Let me state the problem first. > > I have many (15) boolean attributes and I need to be able to search the > database for entries with any combination of those attributes for being > true. For example - find all the entries, where a1=a2=a3=true or find > all the entries where a1=a2=a4=true etc... > Because there are so many of them (and the database is HUGE), putting > every attribute into a separate column and creating a separate index on > every possible combination, is really out of the question. > So, I was thinking about creating a single int2 column, with each bit > representing an attribute - so that, the first query I quoted above > would look like "select * from table where attributes & 7 = 7", and the > other query would be > "select * from table where attributes & 11 = 11' etc... > > This looked so beautiful to me, but now I am stuck trying to index that > table [:-(] > > I started off, hoping to get away with btrees. > > I defined an operator >>=(int2,int2) as 'select $1&$2=$2;' > It looks nice so far, but then the question is - what do I do with the > other operations? By analogy with 'normal' comparison operators, I would do: > > > >> (I know the name is taken [:-)] as 'select not $2 >>= $1' > =<< as 'select $2 >>= $1' > << as 'select not $1 >>= $2' > ... and leave '=' intact. > > But then I realized, that these set of operators, does not really define > a complete order - for example, if I compare, say, 5 and 3: > 5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the > same time [:-(] > > So my question is, first of all, is that a problem? Does btree require a > complete order defined? Will it work with partial order? > Secondly, if it is a problem, perhaps, I am missing something here, > assuming that there is no way to define a set of operations to do what I > want and provide a completely ordered set (or do I need it to define a > perfect complete order - what exactly is required for btree to work? Any > ideas?) > > And finally, if there is just no way I could get away with btrees, can I > make an rtree to work for me? Could somebody explain to me (or point me > to a doc somewhere) the meaning of the strategies (and requirements - > like transitivity etc...) I need for an rtree, and also what support > functions (like comparison func in case of a btree) do I need? > > Thank you very much for your attention. > Any input will be greatly appreciated. > > Dima > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Christopher Kings-Lynne wrote: >Why don't you use the 'varbit' or 'bit' type? > Because it doesn't solve anything - there is no way to create an index with that type to do the sort of queries I need :-(