Re: Bloom Indexes - bit array length and the total number of bits (orhash functions ?? ) ! - Mailing list pgsql-hackers

From Avinash Kumar
Subject Re: Bloom Indexes - bit array length and the total number of bits (orhash functions ?? ) !
Date
Msg-id CAN0Tujf-aLrAtr0Hf6jXGeQucRZG56+EtgYZ4zJSGiByss3XJw@mail.gmail.com
Whole thread Raw
In response to Re: Bloom Indexes - bit array length and the total number of bits(or hash functions ?? ) !  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: Bloom Indexes - bit array length and the total number of bits(or hash functions ?? ) !
List pgsql-hackers
Thanks Fabien,

But the 2 direct questions i have are : 

1. What is the structure of the Bloom Index ? Can you please let me know what are the fields of a Bloom Index ? Is it just the Item Pointer and BloomSignatureWord ?
When i describe my bloom index it looks like following. 

postgres=# \d+ foo.idx_bloom_bar 
                   Index "foo.idx_bloom_bar"
 Column  |  Type   | Key? | Definition | Storage | Stats target 
---------+---------+------+------------+---------+--------------
 id      | integer | yes  | id         | plain   | 
 dept    | integer | yes  | dept       | plain   | 
 id2     | integer | yes  | id2        | plain   | 
 id3     | integer | yes  | id3        | plain   | 
 id4     | integer | yes  | id4        | plain   | 
 id5     | integer | yes  | id5        | plain   | 
 id6     | integer | yes  | id6        | plain   | 
 zipcode | integer | yes  | zipcode    | plain   | 
bloom, for table "foo.bar"
Options: length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4, col8=4


2. If we are storing just one signature word per row, how is this aggregated for all column values of that row into one signature in high level ? 
For example, if length = 64, does it mean that a bit array of 64 bits is generated per each row ? 
If col1=4, does it mean the value of col1 is passed to 4 hash functions that generate 4 positions that can be set to 1 in the bit array ?

On Sat, Jun 8, 2019 at 3:11 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Avinash,

> I was testing bloom indexes today. I understand bloom indexes uses bloom
> filters. [...]
>
> So the question here is -
> I assume - number of bits = k. Where k is the total number of hash
> functions used on top of the data that needs to validated. Is that correct
> ? If yes, why do we see the Index 1 performing better than Index 2 ?
> Because, the data has to go through more hash functions (4 vs 2) in Index 1
> than Index 2. So, with Index 1 it should take more time.
> Also, both the indexes have ZERO false positives.
> Please let me know if there is anything simple that i am missing here.

You may have a look at the blog entry about these parameters I redacted a
few year ago:

   http://blog.coelho.net/database/2016/12/11/postgresql-bloom-index.html

--
Fabien.




--
9000799060

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
Next
From: Siarhei Siniak
Date:
Subject: GiST limits on contrib/cube with dimension > 100?