Re: Minmax indexes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Minmax indexes
Date
Msg-id 20140617181659.GC3115@awork2.anarazel.de
Whole thread Raw
In response to Re: Minmax indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Minmax indexes  (Greg Stark <stark@mit.edu>)
Re: Minmax indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: Minmax indexes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-06-17 12:14:00 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Well, I'm not the guy who does things with geometric data, but I don't
> >> want to ignore the significant percentage of our users who are.  As
> >> you must surely know, the GIST implementations for geometric data
> >> types store bounding boxes on internal pages, and that seems to be
> >> useful to people.  What is your reason for thinking that it would be
> >> any less useful in this context?
> >
> > For me minmax indexes are helpful because they allow to generate *small*
> > 'coarse' indexes over large volumes of data. From my pov that's possible
> > possible because they don't contain item pointers for every contained
> > row.
> > That'ill imo work well if there are consecutive rows in the table that
> > can be summarized into one min/max range. That's quite likely to happen
> > for common applications of number of scalar datatypes. But the
> > likelihood of placing sufficiently many rows with very similar bounding
> > boxes close together seems much less relevant in practice. And I think
> > that's generally likely for operations which can't be well represented
> > as btree opclasses - the substructure that implies inside a Datum will
> > make correlation between consecutive rows less likely.
> 
> Well, I don't know: suppose you're loading geospatial data showing the
> location of every building in some country.  It might easily be the
> case that the data is or can be loaded in an order that provides
> pretty good spatial locality, leading to tight bounding boxes over
> physically consecutive data ranges.

Well, it might be doable to correlate them along one axis, but along
both?  That's more complicated... And even alongside one axis you
already get into problems if your geometries are irregularly sized.
Asingle large polygon will completely destroy indexability for anything
stored physically close by because suddently the minmax range will be
huge... So you'll need to cleverly sort for that as well.

I think hierarchical datastructures are so much better suited for this,
that there's little point trying to fit them into minmax. I can very
well imagine that there's benefit in a gist support for only storing one
pointer per block instead of one pointer per item or such. But it seems
like separate feature.

> But I'm not trying to say that we absolutely have to support that kind
> of thing; what I am trying to say is that there should be a README or
> a mailing list post or some such that says: "We thought about how
> generic to make this.  We considered A, B, and C.  We rejected C as
> too narrow, and A because if we made it that general it would have
> greatly enlarged the disk footprint for the following reasons.
> Therefore we selected B."

Isn't 'simpler implementation' a valid reason that's already been
discussed onlist? Obviously simpler implementation doesn't trump
everything, but it's one valid reason...
Note that I have zap to do with the design of this feature. I work for
the same company as Alvaro, but that's pretty much it...

> Basically, I think Heikki asked a good
> question - which was "could we abstract this more?" - and I can't
> recall seeing a clear answer explaining why we could or couldn't and
> what the trade-offs would be.

'could we abstract more' imo is a pretty bad design guideline. It's 'is
there benefit in abstracting more'. Otherwise you end up with way to
complicated systems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Atomics hardware support table & supported architectures
Next
From: Claudio Freire
Date:
Subject: Re: Minmax indexes