Home > mailing lists

Re: Bloom Filter lookup for hash joins - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: Bloom Filter lookup for hash joins
Date	June 26, 2013 18:31:09
Msg-id	CAMkU=1zqY-NtvGKgoPnGLhQLQ0gEZKMx71wH5dWhStePu+Jscg@mail.gmail.com Whole thread Raw
In response to	Re: Bloom Filter lookup for hash joins (Atri Sharma <atri.jiit@gmail.com>)
Responses	Re: Bloom Filter lookup for hash joins Re: Bloom Filter lookup for hash joins
List	pgsql-hackers

Tree view

On Wed, Jun 26, 2013 at 5:01 AM, Atri Sharma <atri.jiit@gmail.com> wrote:

> The problem here is that if the hash table is in memory, doing a hash
> table lookup directly is likely to be cheaper than a bloom filter
> lookup, even if the bloom filter fits into the processor cache and the
> hash table doesn't (10 last level cache hits is slower than one cache
> miss). Bloom filter will help when its not feasible to use an actual
> hash table (lots of large keys), the real lookup is slow (e.g. an
> index lookup), you are doing a lot of lookups to amortize the
> construction cost and the condition is expected to be selective (most
> lookups give a negative). There might be some dataware house types of
> queries where it would help, but it seems like an awfully narrow use
> case with a potential for performance regressions when the planner has
> a bad estimate.

Ok, sounds good. Cant we use bloom filters for the case where the hash
table doesnt fit in memory? Specifically, when reading from disk is
inevitable for accessing the hash table, we can use bloom filters for
deciding which keys to actually read from disk.

I don't think that sounds all that promising. When the hash table does not fit in memory, it is either partitioned into multiple passes, each of which do fit in memory, or it chooses a different plan altogether. Do we know under what conditions a Bloom filter would be superior to those options, and could we reliably detect those conditions?

Cheers,

Jeff

pgsql-hackers by date:

From: Rodrigo Gonzalez
Date: 26 June 2013, 18:28:11
Subject: Re: Kudos for Reviewers -- straw poll

From: Noah Misch
Date: 26 June 2013, 18:32:11
Subject: Re: MD5 aggregate

Re: Bloom Filter lookup for hash joins - Mailing list pgsql-hackers

Previous

Next