Thread: Limited Sort

Limited Sort

From

Gregory Stark

Date:

18 September 2006, 10:26:41


So I have a quick prototype of this and in fact it handles the common use case
of a paged web page sorted on non-indexed columns very well. If you have only
a small limit like most web pages often avoids external sorts and produces
results 10-20x faster or more. Obviously by raising the size of the input data
set and lowering the limit you can get an arbitrarily large speedup.

The changes in tuplesort are really minimal. I refactored heapify since both
the regular external sort and the bounded sort needed it, but if it hadn't
been for that I think only about 25 lines would actually be added to tuplesort
and only a handful changed.


The part that's still bugging me is how nodeLimit communicates to nodeSort.
What I've done for now is below. It bothers me that nodeLimit is peeking into
nodeSort though. In particular it seems short sighted since nodeSort is far
from the only node that could benefit from this information. If there's a
HashAggregate for example it could stop creating new hash entries when the
limit is reached. So it seems worthwhile having some sort of abstract API
where other nodes could handle the information if they want without having to
teach nodeLimit about every other node type that can.


Index: src/backend/executor/nodeLimit.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeLimit.c,v
retrieving revision 1.27
diff -c -r1.27 nodeLimit.c
*** src/backend/executor/nodeLimit.c    26 Jul 2006 19:31:50 -0000    1.27
--- src/backend/executor/nodeLimit.c    14 Sep 2006 14:53:39 -0000
***************
*** 270,275 ****
--- 270,295 ----     /* Reset position to start-of-scan */     node->position = 0;     node->subSlot = NULL;
+ 
+     /* This is a bit of a kluge 
+      *
+      * I'm not aware of any abstract way to communicate between the two nodes.
+      * Perhaps I should add a new method in nodeSort like ExecAdviseSort and
+      * have a dispatcher in ExecNode that only has one branch. It isn't likely
+      * to be abstract enough to handle other forms of "advice" though.
+      */
+     if (!node->noCount && (IsA(outerPlanState(node), SortState)))
+     {
+         SortState *sortState = (SortState*)outerPlanState(node);
+         int64 tuples_needed = node->count + node->offset;
+ 
+         /* check for overflow */
+         if (tuples_needed < 0)
+             return;
+             
+         sortState->bounded = true;
+         sortState->bound = tuples_needed;
+     } }



I also think it might be easy to do what Martin suggested and trim the tapes
to the limit as well. I can't see a big use case for this since you would have
to have an extremely large input set to make it kick in before the final merge
but there's no reason not to do it either. I don't think it adds any
significant complexity.

And lastly I find the idea of putting attention into OLAP functionality
interesting. Does anyone have any ideas on where to start with that?



--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Limited Sort

From

Simon Riggs

Date:

18 September 2006, 12:31:22

On Mon, 2006-09-18 at 14:26 +0100, Gregory Stark wrote:

> And lastly I find the idea of putting attention into OLAP functionality
> interesting. Does anyone have any ideas on where to start with that?

In SQL:2003 the OLAP functionality mostly relies on the concept of
sorted Windows over which "Windowed Aggregates" are defined.

Windows can be defined as ( CURRENT ROW | ( UNBOUNDED | n ) PRECEDING )[( UNBOUNDED | n ) FOLLOWING]

e.g. 

current situation: CURRENT ROW

running average over last 10 rows10 PRECEDING

running totalUNBOUNDED PRECEDING

The logic is set for the Window and doesn't make much sense to put that
in the aggregate. I imagine that we would do this by allowing the sort
results to be held as an internal cursor, so we can rewind by the
appropriate number. If preceding sort was an external sort, then we use
a tuplestore, modified so that it acts as a fixed-length FIFO queue so
we retain lookback capability.

n FOLLOWING would be emulated by fetching n rows ahead so that the
cursor was set at the correct place before starting calculation.

UNBOUNDED PRECEDING can be handled by caching the output of the last
final function in the query state, so that we do not need to have a
lookback capability at all in that case. (This may need to have an
additional function type definition, since it won't work with all
functions). e.g.

select     val, sum(val) over (order by val rows unbounded preceding) 
from foo;

val    running_SUM_val
1    1        we cache final_func output =1
2    3        =2+1, we then cache it again=3
3    6        =3+3, we then cache it again=6
4    10        =4+6

OLAP is a lot more complex than I've made out, but the only other thing
we need in tuplesort.c to support Windowed aggregates is:ORDER BY foo (NULLS FIRST | NULLS LAST)
which we cannot yet specify. 

I'm not planning to work on this just yet, but there's always next year.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Limited Sort

From

Teodor Sigaev

Date:

18 September 2006, 14:21:59

When we was talking about optimizing 'ORDER BY .. LIMIT' with Oleg and Alvaro at 
conference, we was thinking to make new executor's node - Partial Sort. And 
planner may choose which nodes to use: nodeSort and nodeLimit or nodePartialSort.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Limited Sort

From

Gregory Stark

Date:

18 September 2006, 15:38:48

Teodor Sigaev <teodor@sigaev.ru> writes:

> When we was talking about optimizing 'ORDER BY .. LIMIT' with Oleg and Alvaro
> at conference, we was thinking to make new executor's node - Partial Sort. And
> planner may choose which nodes to use: nodeSort and nodeLimit or
> nodePartialSort.

That's an interesting option too. Certainly it preserves the kind of
separation between nodes that we have currently. It also might make costing
the plan more practical.

The downside would be that we would start having a proliferation of nodes.
Several node types can take advantage of a limit. 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com