BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys) - Mailing list pgsql-hackers

From Tomas Vondra
Subject BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Date
Msg-id f6aab55c-35eb-4118-a7ff-571c62e6cc39@enterprisedb.com
Whole thread Raw
Responses Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
List pgsql-hackers
Hi,

while experimenting with BRIN indexes after a couple FOSDEM discussions,
I ran into the existing limitation that BRIN indexes don't handle array
scan keys. So BRIN indexes can be used for conditions like

    WHERE a IN (1,2,3,4,5)

but we essentially treat the values as individual scan keys, and for
each one we scan the BRIN index and build/update the bitmap. Which for
large indexes may be fairly expensive - the cost is proportional to the
number of values, so if building the bitmap for 1 value takes 10ms, for
100 values it'll take ~1000ms.

It's not hard to construct cases like this (e.g. when using indexes with
small pages_per_range values) etc. Of course, if the query does a lot of
other expensive stuff, this cost may be insignificant.

I'm not sure how often people do queries with such conditions. But I've
been experimenting with features that'd build such paths, so I took a
stab at a PoC, which can significantly reduce the time needed to build
the bitmaps. And there's a couple more interesting opportunities.


0001 - Support SK_SEARCHARRAY in BRIN minmax
--------------------------------------------
The 0001 part does a "naive" SK_SEARCHARRAY implementation for minmax.
It simply sets amsearcharray=true and then tweaks the consistent
function to handle both the scalar and array scan keys.

This is obviously rather inefficient, because the array is searched
linearly. So yes, we don't walk the index repeatedly, but we have to
compare each range to (almost-)all values.


0002 - Sort the array in brinrescan() and do binsearch
------------------------------------------------------
There's a simple way to optimize the naive approach by sorting the array
and then searching in this array. If the array is sorted, we can search
for the first value >= minvalue, and see if that is consistent (i.e. if
it's <= maxval).

In my experiments this cuts the time needed to build the bitmap for
array to pretty much the same as for a single value.

I think this is similar to the preprocessing of scan keys in b-tree, so
brinrescan() is a natural way to do the sort. The problem however is
where to store the result.

Ideally, we'd store it in BrinOpaque (just like BTScanOpaque in btree),
but the problem is we don't pass that to the consistent functions. Those
only get the ScanKeys and stuff extracted from BrinOpaque.

We might add a parameter to the "consistent" function, but that
conflicts with not wanting to break existing extensions implementing
their own BRIN indexes. We allow the opclasses to define "consistent"
with either 4 or 5 arguments. Adding an argument would mean 5 or 6
arguments, but because of the backwards compatibility we'd need to
support existing opclasses, and 5 is ambiguous :-/

In hindsight, I would probably not chose supporting both 4 and 5
arguments again. It makes it harder for us to maintain the code to make
life easier for extensions, but I'm not aware of any out-of-core BRIN
opclasses anyway. So I'd probably just change the API, it's pretty easy
to update existing extensions.

This patch however does a much simpler thing - it just replaces the
array in the SK_SEARCHARRAY scan key with a sorted one. That works for
for minmax, but not for bloom/inclusion, because those are not based on
sorting. And the ArrayType is not great for minmax either, because it
means we need to deconstruct it again and again, for each range. It'd be
much better to deconstruct the array once.

I'll get back to this ...


0003 - Support SK_SEARCHARRAY in BRIN inclusion
-----------------------------------------------
Trivial modification to support array scan keys, can't benefit from
sorting the array.


0004 - Support SK_SEARCHARRAY in BRIN bloom
-------------------------------------------
Trivial modification to support array scan keys, can't benefit from
sorted array either.

But we might "preprocess" the keys in a different way - bloom needs to
calculate two hashes per key, and at the moment it happens again and
again for each range. So if you have 1M ranges, and SK_SEARCHARRAY query
with 100 values, we'll do 100M calls to PROCNUM_HASH and 200M calls to
hash_uint32_extended(). And our hash functions are pretty expensive,
certainly compared to the fast functions often used for bloom filters.

So the preprocessing might actually calculate the hash functions once,
and then only reuse those in the "consistent" function.

0005 is a dirty PoC illustrating the benefit of caching the hashes.

Unfortunately, this complicates things, because it means:

* The scan key preprocessing is not universal for all BRIN opclasses,
  because some opclasses, i.e. each BRIN opclass might have optional
  BRIN_PROCNUM_PREPROCESS which would preprocess the keys the way the
  opclass would like.

* We can simply replace the array in the scan key the way minmax does
  that with the sorted array, because the data type is not the same
  (hashes are uint64).


When I started to write this e-mail I thought there's pretty much just
one way to move this forward:

1) Add a BRIN_PROCNUM_PREPROCESS to BRIN, doing the preprocessing (if
   not defined, the key is not preprocessed.

2) Store the preprocessed keys in BrinOpaque.

3) Modify the BRIN API to allow passing the preprocessed keys.

As mentioned earlier, I'm not sure how difficult would it be to maintain
backwards compatibility, considering the number of arguments of the
consistent function would be ambiguous.

Maybe the existence of BRIN_PROCNUM_PREPROCESS would be enough to decide
this - if it's decided, no keys are preprocessed (and the opclass would
not support SK_SEARCHARRAY).


But now I realize maybe we can do without adding parameters to the
"consistent" function. We might stash "preprocessed" scankeys into
BrinOpaque, and pass them to the consistent function instead of the
"original" scan keys (at least when the BRIN_PROCNUM_PREPROCESS). In
fact, I see ScanKey even allows AM-specific flags, maybe it'd be useful
to to mark the preprocessed keys.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Making Vars outer-join aware
Next
From: Tom Lane
Date:
Subject: Re: ExecRTCheckPerms() and many prunable partitions (sqlsmith)