BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys) |
Date | |
Msg-id | f6aab55c-35eb-4118-a7ff-571c62e6cc39@enterprisedb.com Whole thread Raw |
Responses |
Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
|
List | pgsql-hackers |
Hi, while experimenting with BRIN indexes after a couple FOSDEM discussions, I ran into the existing limitation that BRIN indexes don't handle array scan keys. So BRIN indexes can be used for conditions like WHERE a IN (1,2,3,4,5) but we essentially treat the values as individual scan keys, and for each one we scan the BRIN index and build/update the bitmap. Which for large indexes may be fairly expensive - the cost is proportional to the number of values, so if building the bitmap for 1 value takes 10ms, for 100 values it'll take ~1000ms. It's not hard to construct cases like this (e.g. when using indexes with small pages_per_range values) etc. Of course, if the query does a lot of other expensive stuff, this cost may be insignificant. I'm not sure how often people do queries with such conditions. But I've been experimenting with features that'd build such paths, so I took a stab at a PoC, which can significantly reduce the time needed to build the bitmaps. And there's a couple more interesting opportunities. 0001 - Support SK_SEARCHARRAY in BRIN minmax -------------------------------------------- The 0001 part does a "naive" SK_SEARCHARRAY implementation for minmax. It simply sets amsearcharray=true and then tweaks the consistent function to handle both the scalar and array scan keys. This is obviously rather inefficient, because the array is searched linearly. So yes, we don't walk the index repeatedly, but we have to compare each range to (almost-)all values. 0002 - Sort the array in brinrescan() and do binsearch ------------------------------------------------------ There's a simple way to optimize the naive approach by sorting the array and then searching in this array. If the array is sorted, we can search for the first value >= minvalue, and see if that is consistent (i.e. if it's <= maxval). In my experiments this cuts the time needed to build the bitmap for array to pretty much the same as for a single value. I think this is similar to the preprocessing of scan keys in b-tree, so brinrescan() is a natural way to do the sort. The problem however is where to store the result. Ideally, we'd store it in BrinOpaque (just like BTScanOpaque in btree), but the problem is we don't pass that to the consistent functions. Those only get the ScanKeys and stuff extracted from BrinOpaque. We might add a parameter to the "consistent" function, but that conflicts with not wanting to break existing extensions implementing their own BRIN indexes. We allow the opclasses to define "consistent" with either 4 or 5 arguments. Adding an argument would mean 5 or 6 arguments, but because of the backwards compatibility we'd need to support existing opclasses, and 5 is ambiguous :-/ In hindsight, I would probably not chose supporting both 4 and 5 arguments again. It makes it harder for us to maintain the code to make life easier for extensions, but I'm not aware of any out-of-core BRIN opclasses anyway. So I'd probably just change the API, it's pretty easy to update existing extensions. This patch however does a much simpler thing - it just replaces the array in the SK_SEARCHARRAY scan key with a sorted one. That works for for minmax, but not for bloom/inclusion, because those are not based on sorting. And the ArrayType is not great for minmax either, because it means we need to deconstruct it again and again, for each range. It'd be much better to deconstruct the array once. I'll get back to this ... 0003 - Support SK_SEARCHARRAY in BRIN inclusion ----------------------------------------------- Trivial modification to support array scan keys, can't benefit from sorting the array. 0004 - Support SK_SEARCHARRAY in BRIN bloom ------------------------------------------- Trivial modification to support array scan keys, can't benefit from sorted array either. But we might "preprocess" the keys in a different way - bloom needs to calculate two hashes per key, and at the moment it happens again and again for each range. So if you have 1M ranges, and SK_SEARCHARRAY query with 100 values, we'll do 100M calls to PROCNUM_HASH and 200M calls to hash_uint32_extended(). And our hash functions are pretty expensive, certainly compared to the fast functions often used for bloom filters. So the preprocessing might actually calculate the hash functions once, and then only reuse those in the "consistent" function. 0005 is a dirty PoC illustrating the benefit of caching the hashes. Unfortunately, this complicates things, because it means: * The scan key preprocessing is not universal for all BRIN opclasses, because some opclasses, i.e. each BRIN opclass might have optional BRIN_PROCNUM_PREPROCESS which would preprocess the keys the way the opclass would like. * We can simply replace the array in the scan key the way minmax does that with the sorted array, because the data type is not the same (hashes are uint64). When I started to write this e-mail I thought there's pretty much just one way to move this forward: 1) Add a BRIN_PROCNUM_PREPROCESS to BRIN, doing the preprocessing (if not defined, the key is not preprocessed. 2) Store the preprocessed keys in BrinOpaque. 3) Modify the BRIN API to allow passing the preprocessed keys. As mentioned earlier, I'm not sure how difficult would it be to maintain backwards compatibility, considering the number of arguments of the consistent function would be ambiguous. Maybe the existence of BRIN_PROCNUM_PREPROCESS would be enough to decide this - if it's decided, no keys are preprocessed (and the opclass would not support SK_SEARCHARRAY). But now I realize maybe we can do without adding parameters to the "consistent" function. We might stash "preprocessed" scankeys into BrinOpaque, and pass them to the consistent function instead of the "original" scan keys (at least when the BRIN_PROCNUM_PREPROCESS). In fact, I see ScanKey even allows AM-specific flags, maybe it'd be useful to to mark the preprocessed keys. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
- 0001-Support-SK_SEARCHARRAY-in-BRIN-minmax-20230213.patch
- 0002-Sort-the-array-in-brinrescan-and-do-binsear-20230213.patch
- 0003-Support-SK_SEARCHARRAY-in-BRIN-minmax-multi-20230213.patch
- 0004-Support-SK_SEARCHARRAY-in-BRIN-inclusion-20230213.patch
- 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230213.patch
- 0006-PoC-caching-hash-values-in-BRIN-bloom-20230213.patch
pgsql-hackers by date: