Re: WIP: BRIN multi-range indexes - Mailing list pgsql-hackers

From John Naylor
Subject Re: WIP: BRIN multi-range indexes
Date
Msg-id CAFBsxsFudhzy1gUMp6fyj7xDXqZf5VPGC3krqsz42_0QGwcBBQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
> On 1/23/21 12:27 AM, John Naylor wrote:

> > Still, it would be great if multi-minmax can be a drop in replacement. I
> > know there was a sticking point of a distance function not being
> > available on all types, but I wonder if that can be remedied or worked
> > around somehow.
> >
>
> Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> replacement for minmax (essentially, using these opclasses as the
> default ones, with the option to switch back to plain minmax). I'm not
> convinced we should do that - though. Imagine you have minmax indexes in
> your existing DB, it's working perfectly fine, and then we come and just
> silently change that during dump/restore. Is there some past example
> when we did something similar and it turned it to be OK?

I was assuming pg_dump can be taught to insert explicit opclasses for minmax indexes, so that upgrade would not cause surprises. If that's true, only new indexes would have the different default opclass.

> As for the distance functions, I'm pretty sure there are data types
> without "natural" distance - like most strings, for example. We could
> probably invent something, but the question is how much we can rely on
> it working well enough in practice.
>
> Of course, is minmax even the right index type for such data types?
> Strings are usually "labels" and not queried using range queries,
> although sometimes people encode stuff as strings (but then it's very
> unlikely we'll define the distance definition well). So maybe for those
> types a hash / bloom would be a better fit anyway.

Right.

> But I do have an idea - maybe we can do without distances, in those
> cases. Essentially, the primary issue of minmax indexes are outliers, so
> what if we simply sort the values, keep one range in the middle and as
> many single points on each tail?

That's an interesting idea. I think it would be a nice bonus to try to do something along these lines. On the other hand, I'm not the one volunteering to do the work, and the patch is useful as is.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Allow matching whole DN from a client certificate
Next
From: "Finnerty, Jim"
Date:
Subject: Re: Challenges preventing us moving to 64 bit transaction id (XID)?