Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)
Date
Msg-id CAH2-WzkUEKmYAH21F=WsBODYSCw8JZhx86VFrVQTHasXTS_MLg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Tue, Sep 19, 2017 at 1:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 09/19/2017 06:03 PM, Peter Geoghegan wrote:
>> I believe that parallelism makes the use of Bloom filter a lot more
>> compelling, too. Obviously this is something that wasn't taken into
>> consideration in 2015.
>>
>
> I haven't thought about it from that point of view. Can you elaborate
> why that would be the case? Sorry if this was explained earlier in this
> thread (I don't see it in the history, though).

Well, IPC and locking shared state to protect the state's structure is
potentially a big bottleneck for parallel hash join. I think that
Bloom filters were first used in distributed databases in the 1980s,
where a network round trip could be saved, which this is a little
like. That's why my guess is that Bloom filtering will be more
valuable when parallelism is used.

I think that right deep hash joins make this really compelling, if and
when they allow you to build multiple Bloom filters that can be
combined from multiple right deep hash table builds. I think you can
do fancy things like reduce the amount of I/O against a star schema
fact table considerably. You can use one Bloom filter (built against
some dimension table) to drive a bitmap index scan on a fact table
index, and then another Bloom filter (built against some other
dimension table) to drive another bitmap index scan. The executor then
does a bitmap AND to combine the two for a bitmap heap scan on the
fact table.

(Maybe this technique doesn't necessarily use a Bloom filter; it could
be some other type of bitmap.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] coverage analysis improvements