Thread: Bloom index

Bloom index

From

Teodor Sigaev

Date:

13 January 2010, 10:20:02

README.bloom:

contrib/bloom provides signature file based index.

Authors: Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov (oleg@sai.msu.su)

Notes: This is a *working* prototype, which can be finishing up to production
in case of interest. Particularly, it misses wal support, because of
common  limitation of contrib modules.

This index is useful if table has many attributes and queries can include
their arbitary combinations. Traditional Btree index is faster than
bloom index , but it'd require too many indexes to support all possible
queries, while one need only one bloom index. Bloom index supports only
equality comparison. Since it's a signature file, not a tree, it always
should be readed fully, but sequentially, so search performance is
constant and doesn't depends on a query.
Implementation of Bloom filter (http://en.wikipedia.org/wiki/Bloom_filter)
allows fast exclusion of non-candidate tuples.
Since signature is a lossy representation of all indexed attributes,
search results should be rechecked using heap information.
User can specify signature length (in uint16, default is 5) and the number of
bits, which can be setted, per attribute ( 1 < colN < 2048 ).

  For example:

CREATE INDEX bloomidx ON tbloom(i1,i2,i3)
        WITH (length=5, col1=2, col2=2, col3=4);

Here, we create bloom index with signature length 80 bits and attributes
i1, i2  mapped to 2 bits, attribute i3 - to 4 bits.


Todo:
* add more opclasses
* better configurability
* add support of arrays with  operations contains and intersection

Example of usage:

select
         (generate_series(1,1000)*random())::int as i1,
         (generate_series(1,1000)*random())::int as i2,
         (generate_series(1,1000)*random())::int as i3,
         (generate_series(1,1000)*random())::int as i4,
         (generate_series(1,1000)*random())::int as i5,
         (generate_series(1,1000)*random())::int as i6,
         (generate_series(1,1000)*random())::int as i7,
         (generate_series(1,1000)*random())::int as i8,
         (generate_series(1,1000)*random())::int as i9,
         (generate_series(1,1000)*random())::int as i10,
         (generate_series(1,1000)*random())::int as i11,
         (generate_series(1,1000)*random())::int as i12,
         (generate_series(1,1000)*random())::int as i13
into tbloom;
create index bloomidx on tbloom using
bloom(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10,i11,i12);
select pg_relation_size('bloomidx');
create index btree_idx on tbloom(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10,i11,i12);
select pg_relation_size('btree_idx');


=# explain analyze  select * from tbloom where i2=20 and i10=15;
                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on tbloom  (cost=1.50..5.52 rows=1 width=52) (actual
time=0.057..0.057 rows=0 loops=1)
    Recheck Cond: ((i2 = 20) AND (i10 = 15))
    ->  Bitmap Index Scan on bloomidx  (cost=0.00..1.50 rows=1 width=0) (actual
time=0.041..0.041 rows=9 loops=1)
          Index Cond: ((i2 = 20) AND (i10 = 15))
  Total runtime: 0.081 ms
(5 rows)

Seqscan is slow.

=# set enable_bitmapscan = off;
=# set enable_indexscan = off;
=# explain analyze  select * from tbloom where i2=20 and i10=15;
                                             QUERY PLAN
--------------------------------------------------------------------------------------------------
  Seq Scan on tbloom  (cost=0.00..25.00 rows=1 width=52) (actual
time=0.162..0.162 rows=0 loops=1)
    Filter: ((i2 = 20) AND (i10 = 15))
  Total runtime: 0.181 ms
(3 rows)

Btree index will be not used.

=# drop index bloomidx;
=# create index btree_idx on tbloom(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10,i11,i12);
=# explain analyze  select * from tbloom where i2=20 and i10=15;
                                             QUERY PLAN
--------------------------------------------------------------------------------------------------
  Seq Scan on tbloom  (cost=0.00..25.00 rows=1 width=52) (actual
time=0.210..0.210 rows=0 loops=1)
    Filter: ((i2 = 20) AND (i10 = 15))
  Total runtime: 0.250 ms
(3 rows)

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

bloom-0.2.tar.gz

Re: Bloom index

From

Robert Haas

Date:

13 January 2010, 11:15:28

2010/1/13 Teodor Sigaev <teodor@sigaev.ru>:
> CREATE INDEX bloomidx ON tbloom(i1,i2,i3)
>       WITH (length=5, col1=2, col2=2, col3=4);
>
> Here, we create bloom index with signature length 80 bits and attributes
> i1, i2  mapped to 2 bits, attribute i3 - to 4 bits.

This is pretty darn slick.  I haven't looked at the code yet, but the
functionality sounds very cool, and I hope this is something we'll be
able to have as part of PG in the future (though more than likely not
for 8.5, I suspect).

...Robert

Re: Bloom index

From

Teodor Sigaev

Date:

13 January 2010, 12:02:03

> This is pretty darn slick.  I haven't looked at the code yet, but the

It's just a prototype and/or proof of concept

> functionality sounds very cool, and I hope this is something we'll be
> able to have as part of PG in the future (though more than likely not
> for 8.5, I suspect).
Of course

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Bloom index

From

Oleg Bartunov

Date:

13 January 2010, 16:40:20

On Wed, 13 Jan 2010, Robert Haas wrote:

> 2010/1/13 Teodor Sigaev <teodor@sigaev.ru>:
>> CREATE INDEX bloomidx ON tbloom(i1,i2,i3)
>>       WITH (length=5, col1=2, col2=2, col3=4);
>>
>> Here, we create bloom index with signature length 80 bits and attributes
>> i1, i2  mapped to 2 bits, attribute i3 - to 4 bits.
>
> This is pretty darn slick.  I haven't looked at the code yet, but the
> functionality sounds very cool, and I hope this is something we'll be
> able to have as part of PG in the future (though more than likely not
> for 8.5, I suspect).

Yes, one of the goal of our announcement is to let potential users to know
what we have done and what benefits it (index) provides.

In other words, sponsor(s), you are welcome !
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Bloom index

From

Greg Stark

Date:

17 January 2010, 22:29:36

2010/1/13 Teodor Sigaev <teodor@sigaev.ru>:
>> This is pretty darn slick.  I haven't looked at the code yet, but the
>
> It's just a prototype and/or proof of concept

The only thing that jumps out at me from the code itself is the use of
rand() and srand() which seems unacceptable. At the very least the
srand(attno) step should be precalculated and stored in the metapage.
At least that would make it safe against data type hash functions
which could call rand() themselves.

However this doesn't seem like a very good use of bloom filters to me.Here your sets are always the same size, the n
columns,and the order 
is significant. What you're testing is whether the hash value for a
specific column is present in your "set" of hash values for the
columns of that row.

Bloom filters need to be sized to have n bits per set element to
maintain a targeted false positive rate. And that false positive rate
would be higher than just having an n bit hash for each set element.
Normally they have the advantage that you can test for membership
anywhere in the set quickly -- but in this case you actually only want
to test a specific position in the set anyways.

So I think you can get a lower false positive rate by just truncating
the each column's hash value to n bits and storing an array of them.

--
greg

Re: Bloom index

From

Tom Lane

Date:

17 January 2010, 22:44:22

Greg Stark <gsstark@mit.edu> writes:
> The only thing that jumps out at me from the code itself is the use of
> rand() and srand() which seems unacceptable.

Yeah, (1) rand isn't a good random number generator and (2) fooling with
the main random number sequence is user-unfriendly.  If you need a
private source of random numbers you might base it on erand48 like geqo
has done.
        regards, tom lane

Re: Bloom index

From

Greg Stark

Date:

18 January 2010, 05:06:14

On Mon, Jan 18, 2010 at 2:29 AM, Greg Stark <gsstark@mit.edu> wrote:
> Bloom filters need to be sized to have n bits per set element to
> maintain a targeted false positive rate. And that false positive rate
> would be higher than just having an n bit hash for each set element.
> Normally they have the advantage that you can test for membership
> anywhere in the set quickly -- but in this case you actually only want
> to test a specific position in the set anyways.
>
> So I think you can get a lower false positive rate by just truncating
> the each column's hash value to n bits and storing an array of them.

So for example if your bloom filter is 4 bits per column your error
rate for a single column where clause will be 1/2^(4/1.44) or a little
worse than returning 1 record in 7. If you test two or three columns
then it'll be about 1 in 49 or 1 in 343.

If you just stored a nibble for each column then you could look up the
specific nibble for the column in a single column where clause and get
an error rate of 1 in 16. If you test two or three columns then it
would be 1 in 256 or 1 in 4,096.

If you're aiming at very large numbers of columns with very large
where clauses then perhaps you only need, say, 2 bits per column. But
for any number of bits per column I think you're always better off
just storing that many bits of the each hash rather than using a bloom
filter for this.

Also, as it stands now it's an unordered list. I assume you have no
plans to implement vacuum for this? Perhaps it would be better to
store a btree of signatures ordered by htid. That would let you easily
vacuum dead entries. Though it would necessitate random seeks to do
the full scan of the index unless you can solve the full index scan
concurrent with page splits problem.

-- 
greg

Re: Bloom index

From

Teodor Sigaev

Date:

18 January 2010, 08:10:12

> Yeah, (1) rand isn't a good random number generator and (2) fooling with

I tested rand() and random() generator and results was close to the same, and 
rand() was even slightly better.

> the main random number sequence is user-unfriendly.  If you need a
That's really easy to fix.

> private source of random numbers you might base it on erand48 like geqo
> has done.
Thank you to the point.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Bloom index

From

Teodor Sigaev

Date:

18 January 2010, 08:48:08

> So for example if your bloom filter is 4 bits per column your error
> rate for a single column where clause will be 1/2^(4/1.44) or a little
> worse than returning 1 record in 7. If you test two or three columns
> then it'll be about 1 in 49 or 1 in 343.

Hmm, I don't understand your calculations. In your example (4 bits per column, 
index has only one column) each row is hashed by 4 bits in 80-bit vector 
(signature), so, probability of collision is (1/80)^4

>
> Also, as it stands now it's an unordered list. I assume you have no
> plans to implement vacuum for this? Perhaps it would be better to
> store a btree of signatures ordered by htid. That would let you easily
> vacuum dead entries. Though it would necessitate random seeks to do
> the full scan of the index unless you can solve the full index scan
> concurrent with page splits problem.

Vacuum is already implemented by producing a full scan of index like all other 
indexes.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Bloom index

From

Greg Stark

Date:

18 January 2010, 09:13:51

2010/1/18 Teodor Sigaev <teodor@sigaev.ru>:
>> So for example if your bloom filter is 4 bits per column your error
>> rate for a single column where clause will be 1/2^(4/1.44) or a little
>> worse than returning 1 record in 7. If you test two or three columns
>> then it'll be about 1 in 49 or 1 in 343.
>
> Hmm, I don't understand your calculations. In your example (4 bits per
> column, index has only one column) each row is hashed by 4 bits in 80-bit
> vector (signature), so, probability of collision is (1/80)^4

Sorry, I meant if you had n columns in the index, and the signature
was 4*n bits, and the where clause has one key in the search
condition. According to some guy editing the wikipedia page the number
of bits per value in the set a bloom filter needs to achieve an error
rate of e is 1.44*log_2(1/e). So the error rate for 4 bits per column
would be 1/2^(1.44*nbits). That's larger than the error rate for a
simple hash which would be 1/2^nbits.

The reason bloom filters are advantageous when checking for a value
*anywhere* in the set is that with a simple hash of each value you
would have that error rate for each check, so you would have a much
higher false positive rate for finding it somewhere. But in your case
you know which position you want to check.

-- 
greg

Re: Bloom index

From

Tom Lane

Date:

18 January 2010, 10:50:01

Teodor Sigaev <teodor@sigaev.ru> writes:
>> Yeah, (1) rand isn't a good random number generator and (2) fooling with

> I tested rand() and random() generator and results was close to the same, and
> rand() was even slightly better.

That is true on some platforms, but on others rand() is very bad.
        regards, tom lane