Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Date
Msg-id 4d910260-f22c-4d95-a40d-c177fcc9419c@enterprisedb.com
Whole thread Raw
In response to Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)  (vignesh C <vignesh21@gmail.com>)
Responses Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
List pgsql-hackers
On 1/14/24 12:18, vignesh C wrote:
> On Fri, 14 Jul 2023 at 20:17, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 7/9/23 23:44, Tomas Vondra wrote:
>>> ...
>>>>> Yes, my previous message was mostly about backwards compatibility, and
>>>>> this may seem a bit like an argument against it. But that message was
>>>>> more a question "If we do this, is it actually backwards compatible the
>>>>> way we want/need?")
>>>>>
>>>>> Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
>>>>> doing it that way and report back in a couple days.
>>>>
>>>> Cool. In 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230702.patch, you
>>>> used the preprocess function to pre-calculate the scankey's hash, even
>>>> for scalars. You could use the scratch space in BrinDesc for that,
>>>> before doing anything with SEARCHARRAYs.
>>>>
>>>
>>> Yeah, that's a good idea.
>>>
>>
>> I started looking at this (the scratch space in BrinDesc), and it's not
>> as straightforward. The trouble is BrinDesc is "per attribute" but the
>> scratch space is "per scankey" (because we'd like to sort values from
>> the scankey array).
>>
>> With the "new" consistent functions (that get all scan keys at once)
>> this probably is not an issue, because we know which scan key we're
>> processing and so we can map it to the scratch space. But with the old
>> consistent function that's not the case. Maybe we should support this
>> only with the "new" consistent function variant?
>>
>> This would however conflict with the idea to have a separate consistent
>> function for arrays, which "splits" the scankeys into multiple groups
>> again. There could be multiple SAOP scan keys, and then what?
>>
>> I wonder if the scratch space should be in the ScanKey instead?
> 
> Are we planning to post an updated patch for this? If the interest has
> gone down and if there are no plans to handle this I'm thinking of
> returning this commitfest entry in this commitfest and can be opened
> when there is more interest.
> 

I still think the patch is a good idea and plan to get back to it, but
probably not in this CF. Given that the last update if from July, it's
fair to bump it - either RWF or just move to the next CF. Up to you.

As for the patch, I wonder if Heikki has some idea what to do about the
scratch space? I got stuck on thinking about how to do this with the two
types of consistent functions we support/allow.

To articulate the issue more clearly - the scratch space is "per index"
but we need scratch space "per index key". That's fine - we can simply
have pointers to multiple scratch spaces, I think.

But we have two ways to do consistent functions - the "old" gets scan
keys one attribute at a time, "new" gets all at once. For the "new" it's
not a problem, it's simple to identify the right scratch space. But for
the "old" one, how would that happen? The consistent function has no
idea which index key it's operating on, and how to identify the correct
scratch space.

I can think of two ways to deal with this:

1) Only allow SK_SEARCHARRAY for indexes supporting new-style consistent
functions (but I'm not sure how, considering amsearcharray is set way
before we know what the opclass does, or whether it implements the old
or new consistent function).

2) Allow SK_SEARCHARRAY even with old consistent function, but do some
dance in bringetbitmap() so to set the scratch space accordingly before
the call.

Now that I read Heikki's messages again, I see he suggested assigning a
new procnum to a consistent function supporting SK_SEARCHARRAY, which
seems to be very close to (1). Except that we'd now have 3 ways to
define a consistent function, and that sounds a bit ... too much?

Anyway, thinking about (1), I'm still not sure what to do about existing
opclasses - it'd be nice to have some backwards compatible solution,
without breaking everything and forcing everyone to implement all the
new stuff. Which is kinda why we already have two ways to do consistent
functions. Presumably we'd have to implement some "default" handling by
translating the SK_SEARCHARRAY key into simple equality keys ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Custom explain options
Next
From: Maciek Sakrejda
Date:
Subject: Re: Printing backtrace of postgres processes