Re: Minmax indexes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Minmax indexes
Date
Msg-id 20140618104638.GH3115@awork2.anarazel.de
Whole thread Raw
In response to Re: Minmax indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Minmax indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: Minmax indexes  (Vik Fearing <vik.fearing@dalibo.com>)
List pgsql-hackers
On 2014-06-18 12:18:26 +0300, Heikki Linnakangas wrote:
> On 06/17/2014 09:16 PM, Andres Freund wrote:
> >Well, it might be doable to correlate them along one axis, but along
> >both?  That's more complicated... And even alongside one axis you
> >already get into problems if your geometries are irregularly sized.
> 
> Sure, there are cases where it would be useless. But it's easy to imagine
> scenarios where it would work well, where points are loaded in clusters and
> points that are close to each other also end up physically close to each
> other.

> >Asingle large polygon will completely destroy indexability for anything
> >stored physically close by because suddently the minmax range will be
> >huge... So you'll need to cleverly sort for that as well.
> 
> That's an inherent risk with minmax indexes: insert a few rows to the
> "wrong" locations in the heap, and the selectivity of the index degrades
> rapidly.

Sure. But it's fairly normal to have natural clusteredness in many
columns (surrogate keys, dateseries type of data). Even if you insert
geometric types in a geographic clusters you'll have worse results
because some bounding boxes will be big and such.

And:
> The main problem with using it for geometric types is that you can't easily
> CLUSTER the table to make the minmax index effective again. But there are
> ways around that.

Which are? Sure you can try stuff like recreating the table, sorting
rows with boundary boxes area above threshold first, and then go on to
sort by the lop left corner of the bounding box. But that'll be neither
builtin, nor convenient, nor perfect. In contrast to a normal CLUSTER
for types with a btree opclass which will yield the perfect order.

> >Isn't 'simpler implementation' a valid reason that's already been
> >discussed onlist? Obviously simpler implementation doesn't trump
> >everything, but it's one valid reason...
> >Note that I have zap to do with the design of this feature. I work for
> >the same company as Alvaro, but that's pretty much it...
> 
> Without some analysis (e.g implementing it and comparing), I don't buy that
> it makes the implementation simpler to restrict it in this way. Maybe it
> does, but often it's actually simpler to solve the general case.

So to implement a feature one now has to implement the most generic
variant as a prototype first? Really?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: datistemplate of pg_database does not behave as per description in documentation
Next
From: Andres Freund
Date:
Subject: Re: Minmax indexes