Thread: BRIN indexes for MAX, MIN, ORDER BY?

BRIN indexes for MAX, MIN, ORDER BY?

From
Gavin Wahl
Date:
It seems trivial to accelerate a MAX or MIN query with a BRIN index. You just find the page range with the largest/smallest value, and then only scan that one. Would that be hard to implement? I'm interested in working on it if someone can give me some pointers.

Somewhat harder but still possible would be using BRIN indexes to accelerate ORDER BY. This would require a sorting algorithm that can take advantage of mostly-sorted inputs. You would sort the page ranges by their minimum or maximum value, then feed the sorting algorithm in that order.

Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Alvaro Herrera
Date:
Gavin Wahl wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only
> scan that one. Would that be hard to implement? I'm interested in working
> on it if someone can give me some pointers.

I think this means you need to represent this operation as a specific
Path in some way.  See build_minmax_path() and its callers in planagg;
you probably need to tweak preprocess_minmax_aggregates() to consider
this.

This doesn't look like a simple project to me, mind.

> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

I wouldn't know where to start for this.  Maybe once Tom is done with
planner rejiggering it would make sense to consider looking at how to do
it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Thomas Munro
Date:
On Mon, Sep 28, 2015 at 9:58 AM, Gavin Wahl <gavinwahl@gmail.com> wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only scan
> that one.

You might need to scan more than that if you don't find any rows that
are visible.

> Would that be hard to implement? I'm interested in working on it
> if someone can give me some pointers.
>
> Somewhat harder but still possible would be using BRIN indexes to accelerate
> ORDER BY. This would require a sorting algorithm that can take advantage of
> mostly-sorted inputs. You would sort the page ranges by their minimum or
> maximum value, then feed the sorting algorithm in that order.

Currently you get a Bitmap Index Scan and Bitmap Heap Scan, and then a
Sort node (quicksort or external sort).  So the sort is already
receiving data sorted in BRIN-block order, isn't it?  Are you saying
that the sort node should switch to something like insertion sort in
this case?

http://www.sorting-algorithms.com/nearly-sorted-initial-order

-- 
Thomas Munro
http://www.enterprisedb.com



Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Gavin Wahl wrote:
>> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
>> just find the page range with the largest/smallest value, and then only
>> scan that one. Would that be hard to implement? I'm interested in working
>> on it if someone can give me some pointers.

> I think this means you need to represent this operation as a specific
> Path in some way.  See build_minmax_path() and its callers in planagg;
> you probably need to tweak preprocess_minmax_aggregates() to consider
> this.

> This doesn't look like a simple project to me, mind.

>> Somewhat harder but still possible would be using BRIN indexes to
>> accelerate ORDER BY. This would require a sorting algorithm that can take
>> advantage of mostly-sorted inputs. You would sort the page ranges by their
>> minimum or maximum value, then feed the sorting algorithm in that order.

> I wouldn't know where to start for this.  Maybe once Tom is done with
> planner rejiggering it would make sense to consider looking at how to do
> it.

Yeah.  I would urgently recommend that people *not* try to build new
things like planagg.c right now.  A large part of the point of upper
planner path-ification is to have a less grotty way of dealing with
things like specialized aggregate implementations.

(And yes, I am working on that.  Honest.)
        regards, tom lane



Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Gavin Wahl
Date:
> Yeah.  I would urgently recommend that people *not* try to build new
> things like planagg.c right now.  A large part of the point of upper
> planner path-ification is to have a less grotty way of dealing with
> things like specialized aggregate implementations.

Ok. I will wait and ask again later.

Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Marti Raudsepp
Date:
Hi Gavin

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.

> *partial-knn-1.patch*
>
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

Unfortunatley this work has stalled.

Regards,
Marti

On Sun, Sep 27, 2015 at 11:58 PM, Gavin Wahl <gavinwahl@gmail.com> wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only scan
> that one. Would that be hard to implement? I'm interested in working on it
> if someone can give me some pointers.
>
> Somewhat harder but still possible would be using BRIN indexes to accelerate
> ORDER BY. This would require a sorting algorithm that can take advantage of
> mostly-sorted inputs. You would sort the page ranges by their minimum or
> maximum value, then feed the sorting algorithm in that order.



Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Heikki Linnakangas
Date:
On 09/28/2015 05:28 PM, Marti Raudsepp wrote:
> Note that Alexander Korotkov already started work in 2013 on a
> somewhat similar feature called partial sort:
> http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
>
> In particular, see the 2nd patch for KNN sort -- it uses known
> bounding boxes from the GiST index for sorting, which in many ways is
> similar to min/max BRIN ranges.
>
>> *partial-knn-1.patch*
>>
>> KNN-GiST provides ability to get ordered results from index, but this order
>> is based only on index information. For instance, GiST index contains
>> bounding rectangles for polygons, and we can't get exact distance to
>> polygon from index (similar situation is in PostGIS). In attached patch,
>> GiST distance method can set recheck flag (similar to consistent method).
>> This flag means that distance method returned lower bound of distance and
>> we should recheck it from heap.
>
> Unfortunatley this work has stalled.

No, that was actually committed for 9.5. It's this item in the release 
notes:

> Allow queries to perform accurate distance filtering of
> bounding-box-indexed objects (polygons, circles) using GiST indexes
> (Alexander Korotkov, Heikki Linnakangas)
>
> Previously, a common table expression was required to return a large
> number of rows ordered by bounding-box distance, and then filtered
> further with a more accurate non-bounding-box distance calculation.

- Heikki



Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Jeremy Harris
Date:
On 27/09/15 21:58, Gavin Wahl wrote:
> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

An internal merge sort does well with partially-sorted input.
-- 
Cheers,Jeremy





Re: BRIN indexes for MAX, MIN, ORDER BY?

From
Simon Riggs
Date:
On 29 September 2015 at 13:20, Jeremy Harris <jgh@wizmail.org> wrote:
On 27/09/15 21:58, Gavin Wahl wrote:
> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

An internal merge sort does well with partially-sorted input.

Yes, the $SUBJECT would be a valid use for BRIN.

And also in general for sorts, leading to an optimization of merge joins using BRIN indexes.

All this was foreseen in the original design; if it really was trivial it would be in 9.5 already...

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services