Re: Minmax indexes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Minmax indexes
Date
Msg-id 20140617160428.GE6836@awork2.anarazel.de
Whole thread Raw
In response to Re: Minmax indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Minmax indexes
Re: Minmax indexes
List pgsql-hackers
On 2014-06-17 11:48:10 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 10:31 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-06-17 10:26:11 -0400, Robert Haas wrote:
> >> On Sat, Jun 14, 2014 at 10:34 PM, Alvaro Herrera
> >> <alvherre@2ndquadrant.com> wrote:
> >> > Robert Haas wrote:
> >> >> On Wed, Sep 25, 2013 at 4:34 PM, Alvaro Herrera
> >> >> <alvherre@2ndquadrant.com> wrote:
> >> >> > Here's an updated version of this patch, with fixes to all the bugs
> >> >> > reported so far.  Thanks to Thom Brown, Jaime Casanova, Erik Rijkers and
> >> >> > Amit Kapila for the reports.
> >> >>
> >> >> I'm not very happy with the use of a separate relation fork for
> >> >> storing this data.
> >> >
> >> > Here's a new version of this patch.  Now the revmap is not stored in a
> >> > separate fork, but together with all the regular data, as explained
> >> > elsewhere in the thread.
> >>
> >> Cool.
> >>
> >> Have you thought more about this comment from Heikki?
> >>
> >> http://www.postgresql.org/message-id/52495DD3.9010809@vmware.com
> >
> > Is there actually a significant usecase behind that wish or just a
> > general demand for being generic? To me it seems fairly unlikely you'd
> > end up with something useful by doing a minmax index over bounding
> > boxes.
> 
> Well, I'm not the guy who does things with geometric data, but I don't
> want to ignore the significant percentage of our users who are.  As
> you must surely know, the GIST implementations for geometric data
> types store bounding boxes on internal pages, and that seems to be
> useful to people.  What is your reason for thinking that it would be
> any less useful in this context?

For me minmax indexes are helpful because they allow to generate *small*
'coarse' indexes over large volumes of data. From my pov that's possible
possible because they don't contain item pointers for every contained
row.
That'ill imo work well if there are consecutive rows in the table that
can be summarized into one min/max range. That's quite likely to happen
for common applications of number of scalar datatypes. But the
likelihood of placing sufficiently many rows with very similar bounding
boxes close together seems much less relevant in practice. And I think
that's generally likely for operations which can't be well represented
as btree opclasses - the substructure that implies inside a Datum will
make correlation between consecutive rows less likely.

Maybe I've a major intuition failure here though...

> I do also think that a general demand for being generic ought to carry
> some weight.

Agreed. It's always a balance act. But it's not like this doesn't use a
datatype abstraction concept...

> We have gone to great lengths to make sure that our
> indexing can handle more than just < and >, where a lot of other
> products have not bothered.  I think we have gotten a lot of mileage
> out of that decision and feel that we shouldn't casually back away
> from it.

I don't see this as a case of backing away from that though?

> we shouldn't accept a less-generic
> approach blindly, without questioning whether it's possible to do
> better.

But the aim shouldn't be to add genericity that's not going to be used,
but to add it where it's somewhat likely to help...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Minmax indexes
Next
From: Robert Haas
Date:
Subject: Re: Set new system identifier using pg_resetxlog