Re: WIP: bloom filter in Hash Joins with batches - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: WIP: bloom filter in Hash Joins with batches
Date
Msg-id CAM3SWZTeAYcRSTb2Xet3gDv+gVvXFa4MWcGM6gzwoiV6UY1mmw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: bloom filter in Hash Joins with batches  (Peter Geoghegan <pg@heroku.com>)
Responses Re: WIP: bloom filter in Hash Joins with batches  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Also, have you considered Hash join conditions with multiple
> attributes as a special case? I'm thinking of cases like this:

Sorry, accidentally fat-fingered my enter key before I was finished
drafting that mail. That example isn't useful, because there is no
early/cheap elimination of tuples while scanning the outer relation.

My point about bloom filters on only one attribute (or perhaps
multiple bloom filters on multiple attributes) is that you might be
able to make the bloom filter more effective, particularly relative to
the amount of memory available, by only building it for one attribute
where that distinction (one attribute vs multiple) happens to exist.

Simon said: "So the objective must be to get a Bloom Filter that is
small enough that it lives in a higher/faster level of cache than the
main Hash table". I agree that that's really important here, although
I did note that Tomas wasn't so sure, emphasizing the importance of
avoiding serialization and deserialization overhead -- and certainly,
that's why the patch helps some cases a lot. But Simon's point applies
more to the worst case than the best or average cases, and evidently
we have plenty of worrying about the worst case left to do here.

So, one attribute could have relatively low cardinality, and thus
fewer distinct items in the bloom filter's set, making it far slower
to degrade (i.e. have increased probability of false positives) as
compared to a composite of two or more attributes, and yet perhaps not
significantly less useful in terms of its ability to eliminate
(de)serialization overhead. Or, with two bloom filters (one per
attribute), one bloom filter could degrade far faster than the other
(due to having more distinct values), often very unpredictably, and
yet it wouldn't matter because we'd discard the bad one (maybe really
early, during the scan of the inner relation, using HLL).

I notice that all these example queries involve less-than operators
(inner relation tuples are thereby prevented from being entered into
the hash table). It might not be a terrible idea to hash abbreviated
keys (or even truncated abbreviated keys) for the bloom filter in
certain cases, in particular when there is a correlation in the inner
relation attributes (equi-joined attribute, and attribute that is
other part of qual). There might not be a correlation, of course, but
with some care hashing abbreviated keys may have virtually no
additional overhead, making it worth doing despite the general
uncertainty about any correlation existing. If you're going to accept
that the bloom filter can't be very large, which I think you must,
then hashing an entire value may be no better than hashing an
abbreviated key (those 8 bytes tend to have a lot of entropy). This
point is rather hand-wavy, though.

I suspect that BLOOM_ERROR_RATE isn't the most useful constraint here.
Is the degree to which it helps in sympathetic cases at all
commensurate with the degree to which it hurts in unsympathetic cases?
In your "low selectivity" cases (the sympathetic cases), there are
relatively few values represented in the main hash table, and
therefore in the bloom filter, and so a smaller bloom filter results
automatically anyway. But in the bad cases, the BLOOM_ERROR_RATE
constraint just wastes memory bandwidth for no gain, I think. If the
number of elements is so large that a reasonable fixed sized bloom
filter has a poor false positive rate anyway, we may have already
lost.

My sense is that we should try to make the bloom filter as cheap as
possible more so than trying to make sure it's useful before much work
has been done.

Is it okay that you don't treat MCV/skew values as special? Actually,
looking at this code, I think I notice something:

@@ -238,6 +238,15 @@ ExecHashJoin(HashJoinState *node)
hashvalue);              node->hj_CurTuple = NULL;
 

+               /* If still in the first batch, we check the bloom filter. */
+               if ((hashtable->curbatch == 0) &&
+                   (! ExecHashBloomCheckValue(hashtable, hashvalue)))
+               {
+                       /* no matches; check for possible outer-join fill */
+                       node->hj_JoinState = HJ_FILL_OUTER_TUPLE;
+                       continue;
+               }

Haven't we already established by now that the value of
"node->hj_CurSkewBucketNo" may not be INVALID_SKEW_BUCKET_NO, and so
the value certainly was found in the inner relation scan? Why bother
doing anything with the bloom filter in that common case? (Note that
I'm not suggesting that we don't establish "node->hj_CurSkewBucketNo"
first).

Also, are you aware of this?

http://www.nus.edu.sg/nurop/2010/Proceedings/SoC/NUROP_Congress_Cheng%20Bin.pdf

It talks about bloom filters for hash joins in PostgreSQL
specifically. Interestingly, they talk about specific TPC-H queries.

BTW, I think code like this needs better comments:

+static BloomFilter
+BloomFilterInit(double nrows, double error)
+{
+   /* perhaps we should round nbits to multiples of 8 ? */
+   int nbits = ceil((nrows * log(error)) / log(1.0 / (pow(2.0, log(2.0)))));
+   int nhashes = round(log(2.0) * nbits / nrows);
+
+   BloomFilter filter = palloc0(offsetof(BloomFilterData, data) +
((nbits + 7) / 8));
+
+   filter->nbits = nbits;
+   filter->nhashes = nhashes;

Have you experimentally verified that you get the expected probability
of false positives in practice?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: Add numeric_trim(numeric)
Next
From: Peter Geoghegan
Date:
Subject: Re: WIP: bloom filter in Hash Joins with batches