Re: WIP: BRIN multi-range indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: WIP: BRIN multi-range indexes
Date
Msg-id d4aa7fa0-d06d-6584-9234-8c1696924dde@enterprisedb.com
Whole thread Raw
In response to Re: WIP: BRIN multi-range indexes  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: WIP: BRIN multi-range indexes  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers

On 1/26/21 7:52 PM, John Naylor wrote:
> On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra 
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>> 
> wrote:
>  >
>  >
>  > On 1/23/21 12:27 AM, John Naylor wrote:
> 
>  > > Still, it would be great if multi-minmax can be a drop in 
> replacement. I
>  > > know there was a sticking point of a distance function not being
>  > > available on all types, but I wonder if that can be remedied or worked
>  > > around somehow.
>  > >
>  >
>  > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
>  > replacement for minmax (essentially, using these opclasses as the
>  > default ones, with the option to switch back to plain minmax). I'm not
>  > convinced we should do that - though. Imagine you have minmax indexes in
>  > your existing DB, it's working perfectly fine, and then we come and just
>  > silently change that during dump/restore. Is there some past example
>  > when we did something similar and it turned it to be OK?
> 
> I was assuming pg_dump can be taught to insert explicit opclasses for 
> minmax indexes, so that upgrade would not cause surprises. If that's 
> true, only new indexes would have the different default opclass.
> 

Maybe, I suppose we could do that. But I always found such changes 
happening silently in the background a bit suspicious, because it may be 
quite confusing. I certainly wouldn't expect such difference between 
creating a new index and index created by dump/restore. Did we do such 
changes in the past? That might be a precedent, but I don't recall any 
example ...

>  > As for the distance functions, I'm pretty sure there are data types
>  > without "natural" distance - like most strings, for example. We could
>  > probably invent something, but the question is how much we can rely on
>  > it working well enough in practice.
>  >
>  > Of course, is minmax even the right index type for such data types?
>  > Strings are usually "labels" and not queried using range queries,
>  > although sometimes people encode stuff as strings (but then it's very
>  > unlikely we'll define the distance definition well). So maybe for those
>  > types a hash / bloom would be a better fit anyway.
> 
> Right.
> 
>  > But I do have an idea - maybe we can do without distances, in those
>  > cases. Essentially, the primary issue of minmax indexes are outliers, so
>  > what if we simply sort the values, keep one range in the middle and as
>  > many single points on each tail?
> 
> That's an interesting idea. I think it would be a nice bonus to try to 
> do something along these lines. On the other hand, I'm not the one 
> volunteering to do the work, and the patch is useful as is.
> 

IMO it's fairly small amount of code, so I'll take a stab at in in the 
next version of the patch.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: mkid reference
Next
From: Peter Smith
Date:
Subject: pg_replication_origin_drop API potential race condition