Re: WIP: BRIN multi-range indexes - Mailing list pgsql-hackers

From John Naylor
Subject Re: WIP: BRIN multi-range indexes
Date
Msg-id CACPNZCt=x-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg@mail.gmail.com
Whole thread Raw
In response to Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sun, Sep 13, 2020 at 12:40 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> <20200913 patch set>

Hi Tomas,

The cfbot fails to apply this, but with 0001 from 0912 it works on my
end, so going with that.

One problem I have is I don't get success with the new reloptions:

create index cd_multi on iot using brin(create_dt
timestamptz_minmax_multi_ops) with (values_per_range = 64);
ERROR:  unrecognized parameter "values_per_range"

create index  on iot using brin(create_dt timestamptz_bloom_ops) with
(n_distinct_per_range = 16);
ERROR:  unrecognized parameter "n_distinct_per_range"


Aside from that, I'm going to try to understand the code, and ask
questions. Some of the code might still change, but I don't think it's
too early to do some comment and docs proofreading. I'll do this in
separate emails for bloom and multi-minmax to keep it from being too
long. Perf testing will come sometime later.


Bloom
-----

+     greater than 0.0 and smaller than 1.0. The default values is 0.01,

+     rows per block). The default values is <literal>-0.1</literal>, and

s/values/value/

+     the minimum number of distinct non-null values is <literal>128</literal>.

I don't see 128 in the code, but I do see this, is this the intention?:

#define BLOOM_MIN_NDISTINCT_PER_RANGE 16

+ * Bloom filters allow efficient test whether a given page range contains
+ * a particular value. Therefore, if we summarize each page range into a
+ * bloom filter, we can easily and cheaply test wheter it containst values
+ * we get later.

s/test/testing/
s/wheter it containst/whether it contains/

+ * The index only supports equality operator, similarly to hash indexes.

s/operator/operators/

+ * The number of distinct elements (in a page range) depends on the data,
+ * we can consider it fixed. This simplifies the trade-off to just false
+ * positive rate vs. size.

Sounds like the first sentence should start with "although".

+ * of distinct items to be stored in the filter. We can't quite the input
+ * data, of course, but we may make the BRIN page ranges smaller - instead

I think you accidentally a word.

+ * Of course, good sizing decisions depend on having the necessary data,
+ * i.e. number of distinct values in a page range (of a given size) and
+ * table size (to estimate cost change due to change in false positive
+ * rate due to having larger index vs. scanning larger indexes). We may
+ * not have that data - for example when building an index on empty table
+ * it's not really possible. And for some data we only have estimates for
+ * the whole table and we can only estimate per-range values (ndistinct).

and

+ * The current logic, implemented in brin_bloom_get_ndistinct, attempts to
+ * make some basic sizing decisions, based on the table ndistinct estimate.

+ * XXX We might also fetch info about ndistinct estimate for the column,
+ * and compute the expected number of distinct values in a range. But

Maybe I'm missing something, but the first two comments don't match
the last one -- I don't see where we get table ndistinct, which I take
to mean from the stats catalog?

+ * To address these issues, the opclass stores the raw values directly, and
+ * only switches to the actual bloom filter after reaching the same space
+ * requirements.

IIUC, it's after reaching a certain size (BLOOM_MAX_UNSORTED * 4), so
"same" doesn't make sense here.

+ /*
+ * The 1% value is mostly arbitrary, it just looks nice.
+ */
+#define BLOOM_DEFAULT_FALSE_POSITIVE_RATE 0.01 /* 1% fp rate */

I think we'd want better stated justification for this default, even
if just precedence in other implementations. Maybe I can test some
other values here?

+ * XXX Perhaps we could save a few bytes by using different data types, but
+ * considering the size of the bitmap, the difference is negligible.

Yeah, I think it's obvious enough to leave out.

+ m = ceil((ndistinct * log(false_positive_rate)) / log(1.0 /
(pow(2.0, log(2.0)))));

I find this pretty hard to read and pgindent is going to mess it up
further. I would have a comment with the formula in math notation
(note that you can dispense with the reciprocal and just use
negation), but in code fold the last part to a constant. That might go
against project style, though:

m = ceil(ndistinct * log(false_positive_rate) * -2.08136);

+ * XXX Maybe the 64B min size is not really needed?

Something to figure out before commit?

+ /* assume 'not updated' by default */
+ Assert(filter);

I don't see how these are related.

+ big_h = ((uint32) DatumGetInt64(hash_uint32(value)));

I'm curious about the Int64 part -- most callers use the bare value or
with DatumGetUInt32().

Also, is there a reference for the algorithm for hash values that
follows? I didn't see anything like it in my cursory reading of the
topic. Might be good to include it in the comments.

+ * Tweak the ndistinct value based on the pagesPerRange value. First,

Nitpick: "Tweak" to my mind means to adjust an existing value. The
above is only true if ndistinct is negative, but we're really not
tweaking, but using it as a scale factor. Otherwise it's not adjusted,
only clamped.

+ * XXX We can only do this when the pagesPerRange value was supplied.
+ * If it wasn't, it has to be a read-only access to the index, in which
+ * case we don't really care. But perhaps we should fall-back to the
+ * default pagesPerRange value?

I don't understand this.

+static double
+brin_bloom_get_fp_rate(BrinDesc *bdesc, BloomOptions *opts)
+{
+ return BloomGetFalsePositiveRate(opts);
+}

The body of the function is just a macro not used anywhere else -- is
there a reason for having the macro? Also, what's the first parameter
for?

Similarly, BloomGetNDistinctPerRange could just be inlined or turned
into a function for readability.

+ * or null if it is not exists.

s/is not exists/does not exist/

+ /*
+ * XXX not sure the detoasting is necessary (probably not, this
+ * can only be in an index).
+ */

Something to find out before commit?

+ /* TODO include the sorted/unsorted values */

Patch TODO or future TODO?

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Allow CURRENT_ROLE in GRANTED BY
Next
From: Robert Haas
Date:
Subject: Re: fixing old_snapshot_threshold's time->xid mapping