Bloom Filter lookup for hash joins - Mailing list pgsql-hackers

From Atri Sharma
Subject Bloom Filter lookup for hash joins
Date
Msg-id CAOeZVif-R-iLF966wEipk5By-KhzVLOqpWqurpaK3p5fYw-Rdw@mail.gmail.com
Whole thread Raw
Responses Re: Bloom Filter lookup for hash joins  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Bloom Filter lookup for hash joins  (Ants Aasma <ants@cybertec.at>)
Re: Bloom Filter lookup for hash joins  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Hi All,

I have been researching bloom filters and discussed it on IRC with
RhodiumToad and David Fetter, and they pointed me to the various
places that could potentially have bloom filters, apart from the
places that already have them currently.

I have been reading the current implementation of hash joins, and in
ExecScanHashBucket, which I understand is the actual lookup function,
we could potentially look at a bloom filter per bucket. Instead of
actually looking up each hash value for the outer relation, we could
just check the corresponding bloom filter for that bucket, and if we
get a positive, then lookup the actual values i.e. continue with our
current behaviour (since we could be looking at a false positive).

This doesn't involve a lot of new logic. We need to add a bloom filter
in HashJoinState and set it when the hash table is being made. Also,
one thing we need to look at is the set of hash functions being used
for the bloom filters. This can be a point of further discussion.

The major potential benefit we could gain is that bloom filters never
give false negatives. So, we could save a lot of lookup where the
corresponding bloom filter gives a negative.

This approach can also be particularly useful for hash anti joins,
where we look for negatives. Since bloom filters can easily be stored
and processed, by straight bit operations, we could be looking at a
lot of saving here.

I am not sure if we already do something like this. Please direct me
to the relevant parts in the code if we already do.

Regards,

Atri

--
Regards,

Atri
l'apprenant



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Reduce maximum error in tuples estimation after vacuum.
Next
From: Robins Tharakan
Date:
Subject: Re: Add more regression tests for dbcommands