Home > mailing lists

Re: [HACKERS] WIP: BRIN bloom indexes - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: [HACKERS] WIP: BRIN bloom indexes
Date	October 28, 2017 02:06:58
Msg-id	25f8a046-6b32-88a5-1b80-7ab520ccf290@2ndquadrant.com Whole thread Raw
In response to	Re: [HACKERS] WIP: BRIN bloom indexes (Nico Williams <nico@cryptonector.com>)
Responses	Re: [HACKERS] WIP: BRIN bloom indexes
List	pgsql-hackers

Tree view

Hi,

On 10/27/2017 07:17 PM, Nico Williams wrote:
> On Thu, Oct 19, 2017 at 10:15:32PM +0200, Tomas Vondra wrote:
> 
> A bloom filter index would, indeed, be wonderful.
> 
> Comments:
> 
> + * We use an optimisation that initially we store the uint32 values directly,
> + * without the extra hashing step. And only later filling the bitmap space,
> + * we switch to the regular bloom filter mode.
> 
> I don't think that optimization is worthwhile.  If I'm going to be using
> a Bloom filter, it's because I expect to have a lot of rows.
> 
> (Granted, if I CREATE TABLE .. LIKE .. INCLUDING INDEXES, and the new
> table just won't have lots of rows, then I might want this optimization,
> but I can always just drop the Bloom filter index, or not include
> indexes in the LIKE.)
> 

I think you're confusing "rows" and "distinct values". Furthermore, it's
rather irrelevant how many rows are in the table because BRIN indexes
split the table into "ranges" that are 1MB by default. And you can only
squash certain number of rows into such range.

The idea of the optimization is to efficiently support cases where each
range contains only small number of distinct values - which is quite
common in the cases I described in my initial message (with bursts of
rows with the same IP / client identifier etc.).

> + * XXX Perhaps we could save a few bytes by using different data types, but
> + * considering the size of the bitmap, the difference is negligible.
> 
> A bytea is all that's needed.  See below.
> 
> + * XXX We could also implement "sparse" bloom filters, keeping only the
> + * bytes that are not entirely 0. That might make the "sorted" phase
> + * mostly unnecessary.
> 
> Filter compression is not worthwhile.  You want to have a fairly uniform
> hash distribution, and you want to end up with a Bloom filter that's not
> much more than 50% full.  That means that when near 50% full the filter
> will not be very sparse at all.  Optimizing for the not common case
> doesn't seem worthwhile, and adds complexity.
> 

Properly sizing the bloom filter requires knowledge of many variables,
in particular the number of distinct values expected to be added to the
filter. But we don't really know that number, and we also don't even
know many values useful for estimating that (how many rows will fit into
a range, number of distinct values in the whole table, etc.)

So the idea was to oversize the bloom filter, and then use the sparse
representation to reduce the size.

> + * XXX We can also watch the number of bits set in the bloom filter, and
> + * then stop using it (and not store the bitmap, to save space) when the
> + * false positive rate gets too high.
> 
> Ah, right, what you want is a "Scalable Bloom Filter".
> 

That's not what I had in mind. My idea was to size the bloom filter on
"best effort" basis, and then if one range gets a bit too inaccurate
then just get rid of the filter. If that happens for many ranges, we
should probably allow specifying parameters as relopts for the index.

> A Scalable Bloom filter is actually a series of Bloom filters where all
> but the newest filter are closed to additions, and the newest filter is
> where you do all the additions.  You generally want to make each new
> filter bigger than the preceding one because when searching a Scalable
> Bloom filter you have to search *all* of them, so you want to minimize
> the number of filters.
> 
> Eventually, when you have enough sub-filters, you may want to re-write
> the whole thing so that you start with a single sub-filter that is large
> enough.
> 
> The format of the bytea might then be something like:
> 
> <size><bitmap>[[<size><bitmap>[...]]
> 
> where the last bitmap is the filter that is open to additions.
>

I think this is really an over-engineering, and I certainly don't plan
to extend the patch in this direction.

I do not expect these parameters (particularly the number of distinct
values in a range) to significantly change over time, so the easiest
solution is to provide a reloption to specify that number in
CREATE/ALTER index.

Alternatively, we could allow the summarization process to gather some
statistics (either on the whole range, or perhaps the whole table), and
store them somewhere for later use. For example to compute the number of
distinct values per range (in the existing data), and use that later.

> ...
> 
> Of course, for something like:
> 
>   SELECT a.*, b.* FROM a a JOIN b b USING (some_col);
> 
> a Bloom filter can only be used as an optimization to avoid using a
> slower index (or heap scan) on the inner table source.
> 
> What I'm getting at is that the query planner really needs to understand
> that a Bloom filter is a probabilistic data structure.
> 

It does, and we never produce incorrect results. Which seems like the
right thing to do.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Sokolov Yura
Date: 28 October 2017, 01:53:08
Subject: Re: [HACKERS] logical decoding of two-phase transactions

From: Tomas Vondra
Date: 28 October 2017, 02:09:16
Subject: Re: [HACKERS] WIP: BRIN bloom indexes

Re: [HACKERS] WIP: BRIN bloom indexes - Mailing list pgsql-hackers

Previous

Next