Thread: Btree index extension question
Hi, everybody! I was wonderring if there is somebody out there who could help me with understand how index extensions work... Let me state the problem first. I have many (15) boolean attributes and I need to be able to search the database for entries with any combination of those attributes for being true. For example - find all the entries, where a1=a2=a3=true or find all the entries where a1=a2=a4=true etc... Because there are so many of them (and the database is HUGE), putting every attribute into a separate column and creating a separate index on every possible combination, is really out of the question. So, I was thinking about creating a single int2 column, with each bit representing an attribute - so that, the first query I quoted above would look like "select * from table where attributes & 7 = 7", and the other query would be "select * from table where attributes & 11 = 11' etc... This looked so beautiful to me, but now I am stuck trying to index that table [:-(] I started off, hoping to get away with btrees. I defined an operator >>=(int2,int2) as 'select $1&$2=$2;' It looks nice so far, but then the question is - what do I do with the other operations? By analogy with 'normal' comparison operators, I would do: >> (I know the name is taken [:-)] as 'select not $2 >>= $1' =<< as 'select $2 >>= $1' << as 'select not $1 >>= $2' ... and leave '=' intact. But then I realized, that these set of operators, does not really define a complete order - for example, if I compare, say, 5 and 3: 5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the same time [:-(] So my question is, first of all, is that a problem? Does btree require a complete order defined? Will it work with partial order? Secondly, if it is a problem, perhaps, I am missing something here, assuming that there is no way to define a set of operations to do what I want and provide a completely ordered set (or do I need it to define a perfect complete order - what exactly is required for btree to work? Any ideas?) And finally, if there is just no way I could get away with btrees, can I make an rtree to work for me? Could somebody explain to me (or point me to a doc somewhere) the meaning of the strategies (and requirements - like transitivity etc...) I need for an rtree, and also what support functions (like comparison func in case of a btree) do I need? Thank you very much for your attention. Any input will be greatly appreciated. Dima
Hi, Don't know enough about postgresql to be sure about this, but here goes: If postgresql does bitwise operations, then you can use that instead of defining new operators. Just construct a number for all the columns that need to be true and do a bitwise 'and' with the stored value. (eg. (7 & stored_val) = 7) If postgresql uses an index to supply functions with their parameters, then make a function that'll do the comparison for you and use it in your query. Or make the index (on all the columns) and make a function that takes all the columns as the parameters to compare against (and ofcourse the values that you want to check against). That way you always use the columns of the index in the correct order. Somebody please check this, as I may have been hit with a stupid-stick. On Fri, 15 Mar 2002, Dmitry Tkach wrote: > Hi, everybody! > > I was wonderring if there is somebody out there who could help me with > understand how index extensions work... > Let me state the problem first. > > I have many (15) boolean attributes and I need to be able to search the > database for entries with any combination of those attributes for being > true. For example - find all the entries, where a1=a2=a3=true or find > all the entries where a1=a2=a4=true etc... > Because there are so many of them (and the database is HUGE), putting > every attribute into a separate column and creating a separate index on > every possible combination, is really out of the question. > So, I was thinking about creating a single int2 column, with each bit > representing an attribute - so that, the first query I quoted above > would look like "select * from table where attributes & 7 = 7", and the > other query would be > "select * from table where attributes & 11 = 11' etc... > > This looked so beautiful to me, but now I am stuck trying to index that > table [:-(] > > I started off, hoping to get away with btrees. > > I defined an operator >>=(int2,int2) as 'select $1&$2=$2;' > It looks nice so far, but then the question is - what do I do with the > other operations? By analogy with 'normal' comparison operators, I would do: > > >> (I know the name is taken [:-)] as 'select not $2 >>= $1' > =<< as 'select $2 >>= $1' > << as 'select not $1 >>= $2' > .. and leave '=' intact. > > But then I realized, that these set of operators, does not really define > a complete order - for example, if I compare, say, 5 and 3: > 5 & 3 = 1, 3 & 5 = 1, so I get BOTH 5 << 3 and 5 >> 3 being true at the > same time [:-(] > > So my question is, first of all, is that a problem? Does btree require a > complete order defined? Will it work with partial order? > Secondly, if it is a problem, perhaps, I am missing something here, > assuming that there is no way to define a set of operations to do what I > want and provide a completely ordered set (or do I need it to define a > perfect complete order - what exactly is required for btree to work? Any > ideas?) > > And finally, if there is just no way I could get away with btrees, can I > make an rtree to work for me? Could somebody explain to me (or point me > to a doc somewhere) the meaning of the strategies (and requirements - > like transitivity etc...) I need for an rtree, and also what support > functions (like comparison func in case of a btree) do I need? > > Thank you very much for your attention. > Any input will be greatly appreciated. > > Dima > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
fcanedo@hotpop.com wrote: > >If postgresql does bitwise operations, then you can use that instead of >defining new operators. Just construct a number for all the columns that >need to be true and do a bitwise 'and' with the stored value. (eg. (7 & >stored_val) = 7) > Yeah... The thing is that I want to be able to the index. And to use the index, I need BOOLEAN operators (this seems to be the LEAST of my problems,but anyway) - so, I have to define 'wrappers' around the standard bitwise operations - e.g. a <<= b ---> a & b = a; > > >If postgresql uses an index to supply functions with their parameters, >then make a function that'll do the comparison for you and use it in your >query. > Well ... that's the point - can't do that :-( You can create functional indexes in postgres (and anywhere else AFAIK), but the function must take a SINGLE parameter. In other words, the only way to do what I need would be to create 15 functions, like: check_bit_1 (x) return x & 1 = 1; check_bit_2 (x) return x & 2 = 2; etc... And then create 15 different indexes (one for each func). But even that would not be of much help, because I need to search by a COMBINATION of parameters, and need a COMPOUND index to do that, not a separate index for each attr... > Or make the index (on all the columns) and make a function that >takes all the columns as the parameters to compare against (and ofcourse >the values that you want to check against). That way you always use the >columns of the index in the correct order. > I am not sure I understand this suggestion... If I make the index on all the columns, I would need to specify all the (leftmost) values in the search criteria to be able to use it, right? For example, suppose, I have an index on (a,b,c) - then select * from foo where a=bar and b=bar will work, but select * from foo where b=bar and c=bar will not... That's exactly my problem - I need to be able to search by any combination of the values - (a),(b),(c),(ab),(ac),(bc),(abc) ... only I have 15 of them - too many combinations to consider buidling indexes for any of them :-(
On Fri, 15 Mar 2002, Dmitry Tkach wrote: > fcanedo@hotpop.com wrote: > > > > >If postgresql does bitwise operations, then you can use that instead of > >defining new operators. Just construct a number for all the columns that > >need to be true and do a bitwise 'and' with the stored value. (eg. (7 & > >stored_val) = 7) > > > > Yeah... The thing is that I want to be able to the index. And to use the > index, I need BOOLEAN > operators (this seems to be the LEAST of my problems,but anyway) - so, I > have to define 'wrappers' around the standard bitwise operations - e.g. > a <<= b ---> a & b = a; Ok, let's see whether we understand each other: 1. Make a column that contains the bitstring of your 15 boolean columns. Let's call it bitstring. 2. bitstring is calculated on each insert or update by a trigger. 3. Make an index on bitstring. 4. Make a query to find the records that have a1, a2 and a3 set to true, like so: SELECT * FROM table where (bitstring & 7) = 7; Will this not give you the correct answer and use the index on bitstring?
> > > >SELECT * FROM table where (bitstring & 7) = 7; > >Will this not give you the correct answer and use the index on bitstring? > Yes, and know... (Yes, it will give th ecorrect answer, and NO it will not use the index). The thing is, that it will only use a (btree) index for one of the 'comparison' operators (<=, <, =, >, >=), that compare the value in the table with the query parameter - so, bitstring = 7, or bitstring < 7 etc... will work, but 'do_something_to_bitstring = 7' will not (I mean, it will, but it won't use the index). So, the idea is to redefine those comparison operators for the index, so that I could use one of them fopr my purpose - for example (as I mentioned in the original message), I was trying to define a '<<=' ("less or equals") operator as a <<= b iff a & b = a. My problem is that (as far as I understand) the way I am doing it will not work (my set of operators doesn't define a strict ordering, so that not any pair of values is comparable) :-( And what I am looking for is either a suggestion of how this strict ordering can be defined, or a description of what exactly the btree needs to work (I suspect, that completely strict ordering is not really necesssary, and that I could get away with some kind of a partial order, if I new exactly what the btree stuff is looking for)... Dima
On Fri, 15 Mar 2002, Dmitry Tkach wrote: ... > Yes, and know... (Yes, it will give th ecorrect answer, and NO it will > not use the index). > The thing is, that it will only use a (btree) index for one of the > 'comparison' operators > (<=, <, =, >, >=), that compare the value in the table with the query > parameter - so, > bitstring = 7, or bitstring < 7 etc... will work, but > 'do_something_to_bitstring = 7' will not (I mean, it will, but it won't > use the index). Okay, I'm recovering from my encounter with the stupid stick. I understand this: 1. You want to use a btree index because presumably it's faster than a normal index. 2. A btree index is a binary tree index that uses the order of values to find an answer quickly. 3. In your case for instance: a value of 10 should produce a resultset with bitstrings of 10, 11, 14, 26, ... So yeah, since 12 and 13 are > 10, normal equality operators won't work. If a solution were to be found, I think it should base the ordering on the bits from least significant bit to most significant. So, for 1st bit go left in the tree for a 0 and right for a 1 and same for next bit. But no, this won't work since you don't care about the other boolean columns, solong as the ones your looking for are true! To clarify, if you don't care about the first column/bit, values from both the left and the right parts of the binary tree can be valid. Wild guess: base the ordering on the number of bits and then try to narrow it down in the select with an equal operator. SELECT * FROM TABLE WHERE bitstring > 7 AND bitstring = 7; Wouldn't know how to get this to work though! :( Oracle has a special index type for just this case. I think they call them bitmap indexes.
fcanedo@hotpop.com wrote: > > >I understand this: >1. You want to use a btree index because presumably it's faster than a >normal index. > Ahhh.. What do you mean by 'normal index'? I have two reasons to use btree: - It is simpler to implement; - It can be used in combination with other columns (these 15 boolean flags is not the only stuff I need to search by, so, I am going to have to run queries like '... where flags & 33 = 33 and foo=bar'' - to do that, I would need an index on (flags 'bit_ops', foo), where bit_ops is my special set of operations I am looking to define). If I am not using btree for the flags, I would then have to define the whole indexing strategy for all my columns, and I am really hoping to be able to avoid having to do that. > >2. A btree index is a binary tree index that uses the order of values to >find an answer quickly. > Yep. > >3. In your case for instance: a value of 10 should produce a resultset >with bitstrings of 10, 11, 14, 26, ... > Yep. > >Wouldn't know how to get this to work though! :( > Yeah... me neither :-( I am about to give up on this (hate to do that!)... I was trying to make use of a GiST index now, but having some problems with that too :-( Could you please look at the next message I am about to post in a few minutes (about debugging C functions)? Perhaps, you would have some helpful ideas there? Thanks! Dima