Re: WIP: BRIN multi-range indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: WIP: BRIN multi-range indexes |
Date | |
Msg-id | 20200918113824.n7wfrccp7jiv7mvk@development Whole thread Raw |
In response to | Re: WIP: BRIN multi-range indexes (John Naylor <john.naylor@2ndquadrant.com>) |
List | pgsql-hackers |
On Thu, Sep 17, 2020 at 05:42:59PM -0400, John Naylor wrote: >On Thu, Sep 17, 2020 at 12:34 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> On Thu, Sep 17, 2020 at 10:33:06AM -0400, John Naylor wrote: >> >On Sun, Sep 13, 2020 at 12:40 PM Tomas Vondra >> ><tomas.vondra@2ndquadrant.com> wrote: >> >> <20200913 patch set> >> But those are opclass parameters, so the parameters are not specified in >> WITH clause, but right after the opclass name: >> >> CREATE INDEX idx ON table USING brin ( >> bigint_col int8_minmax_multi_ops(values_per_range = 15) >> ); > >D'oh! > >> >+ * The index only supports equality operator, similarly to hash indexes. >> > >> >s/operator/operators/ >> > >> >> Hmmm, are there really multiple equality operators? > >Ah, I see what you meant -- then "_the_ equality operator" is what we want. > >> The number of items the comment refers to is this: >> >> /* how many uint32 hashes can we fit into the bitmap */ >> int maxvalues = filter->nbits / (8 * sizeof(uint32)); >> >> where nbits is the size of the bloom filter. So I think the "same" is >> quite right here. > >Ok, I get it now. > >> >+ /* >> >+ * The 1% value is mostly arbitrary, it just looks nice. >> >+ */ >> >+#define BLOOM_DEFAULT_FALSE_POSITIVE_RATE 0.01 /* 1% fp rate */ >> > >> >I think we'd want better stated justification for this default, even >> >if just precedence in other implementations. Maybe I can test some >> >other values here? >> > >> >> Well, I don't know how to pick a better default :-( Ultimately it's a >> tarde-off between larger indexes and scanning larger fraction of a table >> due to lower false positive. Then there's the restriction that the whole >> index tuple needs to fit into a single 8kB page. > >Well, it might be a perfectly good default, and it seems common in >articles on the topic, but the comment is referring to aesthetics. :-) >I still intend to test some cases. > I think we may formulate this as a question of how much I/O we need to do for a random query, and pick the false positive rate minimizing that. For a single BRIN range an approximation might look like this: bloom_size(fpr, ...) + (fpr * range_size) + (selectivity * range_size) The "selectivity" shows the true selectivity of ranges, and it might be esimated from a per-row selectivity I guess. But it does not matter much because this is constant and independent of the false-positive rate, so we can ignore it. Which leaves us with bloom_size(fpr, ...) + (fpr * range_size) We might solve this for fixed parameters (range_size, ndistinct, ...), either analytically or by brute force, giving us the "optimal" fpr. The trouble is the bloom_size is restricted, and we don't really know the limit - the whole index tuple needs to fit on a single 8kB page, and there may be other BRIN summaries etc. So I've opted to use a somewhat defensive default for the false positive rate. >> >Also, is there a reference for the algorithm for hash values that >> >follows? I didn't see anything like it in my cursory reading of the >> >topic. Might be good to include it in the comments. >> > >> >> This was suggested by Yura Sokolov [1] in 2017. The post refers to a >> paper [2] but I don't recall which part describes "our" algorithm. >> >> [1] https://www.postgresql.org/message-id/94c173b54a0aef6ae9b18157ef52f03e@postgrespro.ru >> [2] https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf > >Hmm, I came across that paper while doing background reading. Okay, >now I get that "% (filter->nbits - 1)" is the second hash function in >that scheme. But now I wonder if that second function should actually >act on the passed "value" (the original hash), so that they are >actually independent, as required. In the language of that paper, the >patch seems to have > >g(x) = h1(x) + i*h2(h1(x)) + f(i) > >instead of > >g(x) = h1(x) + i*h2(x) + f(i) > >Concretely, I'm wondering if it should be: > > big_h = DatumGetUint32(hash_uint32(value)); > h = big_h % filter->nbits; >-d = big_h % (filter->nbits - 1); >+d = value % (filter->nbits - 1); > >But I could be wrong. > >Also, I take it that f(i) = 1 in the patch, hence the increment here: > >+ h += d++; > >But it's a little hidden. That might be worth commenting, if I haven't >completely missed something. > OK >> >+ * Tweak the ndistinct value based on the pagesPerRange value. First, >> > >> >Nitpick: "Tweak" to my mind means to adjust an existing value. The >> >above is only true if ndistinct is negative, but we're really not >> >tweaking, but using it as a scale factor. Otherwise it's not adjusted, >> >only clamped. >> > >> >> OK. Perhaps 'adjust' would be a better term? > >I felt like rewriting the whole thing, but your original gets the >point across ok, really. > >"If the passed ndistinct value is positive, we can just use that, but >we also clamp the value to prevent over-sizing the bloom filter >unnecessarily. If it's negative, it represents a multiplier to apply >to the maximum number of tuples in the range (assuming each page gets >MaxHeapTuplesPerPage tuples, which is likely a significant >over-estimate), similar to the corresponding value in table >statistics." > >> >+ /* TODO include the sorted/unsorted values */ >> > >> >> This was simplemented as part of the discussion about pageinspect, and >> I wanted some confirmation if the approach is acceptable or not before >> spending more time on it. Also, the values are really just hashes of the >> column values, so I'm not quite sure it makes sense to include that. >> What would you suggest? > >My gut feeling is the hashes are not useful for this purpose, but I >don't feel strongly either way. > OK. I share this feeling. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: