Re: WIP: bloom filter in Hash Joins with batches - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: WIP: bloom filter in Hash Joins with batches
Date
Msg-id CANP8+jKT=Vzv92mSv1Lh2tmeHyhmNBfa5xG6r1msgBT5QDf1Aw@mail.gmail.com
Whole thread Raw
In response to WIP: bloom filter in Hash Joins with batches  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: WIP: bloom filter in Hash Joins with batches  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 15 December 2015 at 22:30, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

  3) Currently the bloom filter is used whenever we do batching, but it
     should really be driven by selectivity too - it'd be good to (a)
     estimate the fraction of 'fact' tuples having a match in the hash
     table, and not to do bloom if it's over ~60% or so. Also, maybe
     the could should count the matches at runtime, and disable the
     bloom filter if we reach some threshold.

Cool results.

It seems a good idea to build the bloom filter always, then discard it if it would be ineffective.

My understanding is that the bloom filter would be ineffective in any of these cases
* Hash table is too small
* Bloom filter too large
* Bloom selectivity > 50% - perhaps that can be applied dynamically, so stop using it if it becomes ineffective

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: "Shulgin, Oleksandr"
Date:
Subject: Re: WIP: bloom filter in Hash Joins with batches
Next
From: Mithun Cy
Date:
Subject: Re: POC: Cache data in GetSnapshotData()