Re: WIP: bloom filter in Hash Joins with batches - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: WIP: bloom filter in Hash Joins with batches
Date
Msg-id 5685924B.6070403@2ndquadrant.com
Whole thread Raw
In response to WIP: bloom filter in Hash Joins with batches  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi,

attached is v2 of the patch, with a number of improvements:

0) This relies on the the other hashjoin patches (delayed build of
    buckets and batching), as it allows sizing the bloom filter.

1) enable_hashjoin_bloom GUC

    This is mostly meant for debugging and testing, not for committing.

2) Outer joins should be working fine now. That is, the results should
    be correct and faster as the outer rows without matches should not
    be batched at all.

3) The bloom filter is now built for all hash joins, not just when
    batching is happening. I've been a bit skeptical about the
    non-batched cases, but it seems that I can get a sizable speedup
    (~20-30%, depending on the selectivity of the join).

4) The code is refactored quite a bit, adding BloomFilterData instead
    of just sprinkling the fields on HashJoinState or HashJoinTableData.

5) To size the bloom filter, we now use HyperLogLog couter, which we
    now have in core thanks to the sorting improvements done by Peter
    Geoghegan. This allows making the bloom filter much smaller when
    possible.

    The patch also extends the HyperLogLog API a bit (which I'll submit
    to the CF independently).


There's a bunch of comments in the code, mostly with ideas about more
possible improvements.

The main piece missing in the patch (IMHO) is optimizer code making
decisions whether to enable bloom filters for the hash join, based on
cardinality estimates. And also the executor code disabling the bloom
filter if they turn inefficient. I don't think that's a major issue at
this point, though, and I think it'll be easier to do based on testing
the current patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: IMPORT FOREIGN SCHEMA return create foreign table commands are those further filtered in LIMIT and EXCEPT cases?
Next
From: Tomas Vondra
Date:
Subject: PATCH: Extending the HyperLogLog API a bit