Thread: GiST intarray rd-tree indexes using intbig

GiST intarray rd-tree indexes using intbig

From
"Jonathan Gray"
Date:
<div class="Section1"><p class="MsoNormal">Hello all,<p class="MsoNormal"> <p class="MsoNormal">I am working on a
search-relatedproject where scaling is a major issue.  Recently I’ve been experimenting with the beautifully designed
rd-treeindexes and intarray contrib module, and it seems like a great solution for me.<p class="MsoNormal"> <p
class="MsoNormal">I’vehit a few bumps and am looking for clarification from anyone familiar with the implementation of
theintbig versions of intarray.<p class="MsoNormal"> <p class="MsoNormal">It is documented that intbig utilizes 4096
bitsignatures to represent the set nodes in the tree.  However, I am unable to find any reference to a 4kbit signature
inthe code and am wondering where this is implemented.<p class="MsoNormal"> <p class="MsoNormal">Also, is the leaf
comparisonalso a signature comparison like the nodes?  Or is this an exact comparison?  From my understanding of the
code,it doesn’t appear to be an exact comparison.  If this is the case, how can I access the original intarray that is
beingreferenced by this signature?<p class="MsoNormal"> <p class="MsoNormal">Thanks in advance.  I have spent many
hoursdigging through code (and deciphering macros) and need a bit of help to get a better understanding before I can
moveforward.<p class="MsoNormal"> <p class="MsoNormal">Jonathan Gray</div> 

Re: GiST intarray rd-tree indexes using intbig

From
Teodor Sigaev
Date:
> It is documented that intbig utilizes 4096 bit signatures to represent 
> the set nodes in the tree.  However, I am unable to find any reference 
> to a 4kbit signature in the code and am wondering where this is implemented.
_int.h:
/* bigint defines */
#define SIGLENINT  63           /* >122 => key will toast, so very slow!!! */
#define SIGLEN  ( sizeof(int)*SIGLENINT )
#define SIGLENBIT (SIGLEN*BITS_PER_BYTE)

63 /*ints*/ * 4 /* byte per int */ * 8 /* bits per byte */ =  2016 bits
From our experience, using power of 2 number of bits causes bad hashing of array.

You can play with value of SIGLENINT:
- less value decreases index's size, but increase number of false drops, so  index will return more values and pgsql
willdrop it after rechecking of  table's value. That increases table's access
 
- greater value increase index's size but decrease number of false drops.

So, you can find optimal SIGLENINT value for your sets of data.


> Also, is the leaf comparison also a signature comparison like the 
> nodes?  Or is this an exact comparison?  From my understanding of the 

What do you mean: comparison of signatures? RD-Tree doesn't use any comparison 
functions like B-Tree does. Here we use distance function. Distance might be 
defined in different meaning, but we use Hemming distance 
(_intbig_gist.c:hemdistsign) which is number of different bits in signatures.

> code, it doesn’t appear to be an exact comparison.  If this is the case, 
> how can I access the original intarray that is being referenced by this 
> signature?

Index doesn't store original int[] at all. From GiST support fuction there is no 
way to get access to table's value :(.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/