Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)
Date
Msg-id 9b9fd273-18e7-2b07-7aa1-4b00ab59b8d1@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)
List pgsql-hackers
Hi,

On 09/19/2017 02:55 AM, Robert Haas wrote:
> On Mon, Sep 18, 2017 at 5:13 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Mon, Sep 18, 2017 at 2:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Sep 18, 2017 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Uh, why does the planner need to be involved at all?
>>>
>>> Because it loses if the Bloom filter fails to filter anything.  That's
>>> not at all far-fetched; consider SELECT * FROM a.x, b.x WHERE a.x =
>>> b.x given a foreign key on a.x referencing b(x).
>>
>> Wouldn't a merge join be a lot more likely in this case anyway? Low
>> selectivity hash joins with multiple batches are inherently slow; the
>> wasted overhead of using a bloom filter may not matter.
>>
>> Obviously this is all pretty speculative. I suspect that this could be
>> true, and it seems worth investigating that framing of the problem
>> first.
> 
> ISTR Tomas Vondra doing some experiments with this a few years ago and
> finding that it was, in fact, a problem.
> 

You seem to have better memory than me, but you're right - I did some
experiments with this in 2015, the WIP patch and discussion is here:
 https://www.postgresql.org/message-id/5670946E.8070705@2ndquadrant.com

The whole idea was that with a bloom filter we can reduce the amount of
tuples (from the outer relation) written to batches.

The patch is fairly simple, and did not try to push the bloom filters to
scan nodes or anything like that. It might be a meaningful first step,
though, particularly for selective joins (where only small number of
rows from the outer relation has a match in the hash table).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Setting pd_lower in GIN metapage
Next
From: Robins Tharakan
Date:
Subject: Re: [HACKERS] psql - add ability to test whether a variable exists