Home > mailing lists

Re: [HACKERS] WIP: BRIN bloom indexes - Mailing list pgsql-hackers

From	Nico Williams
Subject	Re: [HACKERS] WIP: BRIN bloom indexes
Date	October 27, 2017 20:17:13
Msg-id	20171027171712.GJ4496@localhost Whole thread Raw
In response to	[HACKERS] WIP: BRIN bloom indexes (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: [HACKERS] WIP: BRIN bloom indexes Re: [HACKERS] WIP: BRIN bloom indexes
List	pgsql-hackers

Tree view

On Thu, Oct 19, 2017 at 10:15:32PM +0200, Tomas Vondra wrote:

A bloom filter index would, indeed, be wonderful.

Comments:

+ * We use an optimisation that initially we store the uint32 values directly,
+ * without the extra hashing step. And only later filling the bitmap space,
+ * we switch to the regular bloom filter mode.

I don't think that optimization is worthwhile. If I'm going to be using
a Bloom filter, it's because I expect to have a lot of rows.

(Granted, if I CREATE TABLE .. LIKE .. INCLUDING INDEXES, and the new
table just won't have lots of rows, then I might want this optimization,
but I can always just drop the Bloom filter index, or not include
indexes in the LIKE.)

+ * XXX Perhaps we could save a few bytes by using different data types, but
+ * considering the size of the bitmap, the difference is negligible.

A bytea is all that's needed. See below.

+ * XXX We could also implement "sparse" bloom filters, keeping only the
+ * bytes that are not entirely 0. That might make the "sorted" phase
+ * mostly unnecessary.

Filter compression is not worthwhile. You want to have a fairly uniform
hash distribution, and you want to end up with a Bloom filter that's not
much more than 50% full. That means that when near 50% full the filter
will not be very sparse at all. Optimizing for the not common case
doesn't seem worthwhile, and adds complexity.

+ * XXX We can also watch the number of bits set in the bloom filter, and
+ * then stop using it (and not store the bitmap, to save space) when the
+ * false positive rate gets too high.

Ah, right, what you want is a "Scalable Bloom Filter".

A Scalable Bloom filter is actually a series of Bloom filters where all
but the newest filter are closed to additions, and the newest filter is
where you do all the additions. You generally want to make each new
filter bigger than the preceding one because when searching a Scalable
Bloom filter you have to search *all* of them, so you want to minimize
the number of filters.

Eventually, when you have enough sub-filters, you may want to re-write
the whole thing so that you start with a single sub-filter that is large
enough.

The format of the bytea might then be something like:

<size><bitmap>[[<size><bitmap>[...]]

where the last bitmap is the filter that is open to additions.

I wonder if there are write concurrency performance considerations
here...

It might be better to have a bytea value per-sub-filter so that there is
no lock contention for the closed filters. The closed sub-filters are
constant, thus not even shared locks are needed for them, and especially
not exclusive locks.

Writing to the filter will necessitate locking the entire open filter,
or else byte-range locking on it. Something to think about.

> Now, what about query performance? Unlike the "minmax" indexes, the
> "bloom" filter can only handle equality conditions.

A Bloom filter has non-zero false positives for existence, but zero
false positives for non-existence.

This means that a Bloom filter index can only be used for:

a) non-existence tests (with equality predicates, as you point out),

b) as an optimization to avoid slower index checks (or heap scans) when the Bloom filter indicates non-existence.

(b) is really just an internal application of (a).

There might be applications where false positives might be ok in a query
like:
SELECT a.* FROM a a JOIN b b USING (some_col);

but for most real-world queries like that, allowing false positives by
default would be very bad. An option in the index declaration could be
used to enable existence equality predicates, but I would not include
such an option initially -- let's see if anyone actually has a use case
for it first.

Of course, for something like:
SELECT a.*, b.* FROM a a JOIN b b USING (some_col);

a Bloom filter can only be used as an optimization to avoid using a
slower index (or heap scan) on the inner table source.

What I'm getting at is that the query planner really needs to understand
that a Bloom filter is a probabilistic data structure.

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Robert Haas
Date: 27 October 2017, 19:29:10
Subject: Re: [HACKERS] Index only scan for cube and seg

From: Alvaro Herrera
Date: 27 October 2017, 20:27:30
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table

Re: [HACKERS] WIP: BRIN bloom indexes - Mailing list pgsql-hackers

Previous

Next