Optimize ORDER BY ... LIMIT - Mailing list pgsql-hackers

From Gregory Stark
Subject Optimize ORDER BY ... LIMIT
Date
Msg-id 87venpozx8.fsf@enterprisedb.com
Whole thread Raw
Responses Re: Optimize ORDER BY ... LIMIT  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Optimize ORDER BY ... LIMIT  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
I've been looking at doing the following TODO item:
   Allow ORDER BY ... LIMIT # to select high/low value without sort or index   using a sequential scan for
highest/lowestvalues
 
   Right now, if no index exists, ORDER BY ... LIMIT # requires we sort all   values to return the high/low value.
InsteadThe idea is to do a   sequential scan to find the high/low value, thus avoiding the sort.   MIN/MAX already does
this,but not for LIMIT > 1.
 

I think this is pretty important to cover at some point because really _not_
doing this just wrong. We're simply not supporting the correct plan for this
type of query. Currently we're doing a O(nlogn) plan when the right plan would
usually be O(n). (As in, it's actually O(nlogm) where m is usually small and
not interesting).

The way I see to do this is to still use a Sort node and use a tuplesort but
to arrange to get the information of the maximum number of tuples needed to
the tuplesort so it can throw out tuples as it sorts.

My plan is to have tuplesort reuse the existing heap code it uses for tape
merges only keep the memtuples array in a max-heap (instead of the min-heap it
uses now -- that means having a tuplesortstate flag indicating which order and
having the tuplesort_heap* functions check that flag). When it reaches the
limit it can throw away either the new element or the top element on every
insert.

I considered using a simple insertion-sort instead but was worried that the
performance would degrade as the limit clause grows large. I don't think
that's a major use case but I don't like the idea of a O(n^2) algorithm lying
in wait to ambush someone.

Also, because heap sort is slower than qsort (on average anyways) it makes
sense to not bother with the heap until the number of tuples grows well beyond
the limit or until it would otherwise spill to disk.

To actually get the information to the tuplesort the information has to be fed
down to the SortState from the LimitState somehow. This I'm not sure how to
do. There isn't currently any abstract interface between nodes to pass
information like this.

The simple solution is that ExecLimit could just peek at its outerPlanState
and if it's a SortState it can set some fields so the SortState can know to
pass the information to the tuplesort.

I've also considered a more abstract interface such as adding an ExecAdvise()
function that would pass some sort of structure (an node?) down with the
information. This seems like overkill for a single integer but I wonder if
there would be other consumers of such an interface. 

The current eflags could be turned swallowed by this, though I don't see any
particular advantage. More realistically a Unique node could also inform a
Sort node that it can throw away duplicates as it sorts. A limit could even be
passed *through* a unique node as long as the Sort understands how to handle
the combination properly. In other areas, a Hash Aggregate can start throw
away elements once the number of elements in the hash grows to the limit.

Alternatively we could have Limit(Sort()), Unique(Sort()), and
Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not
introduce the Limit and Unique nodes at all. I would worry about duplicated
code in that case though, in particular it seems like there would be cases
where we still want to use qsort rather than throw away unneeded tuples. But
not throwing away unneeded tuples means reimplementing all of nodeLimit in
nodeSort for those cases. And that doesn't help with other cases like
Hash Aggregate.

Or am I overthinking this and having some state nodes peek inside other state
nodes is normal?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Ricardo Malafaia"
Date:
Subject: Fwd: polite request about syntax
Next
From: Douglas McNaught
Date:
Subject: Re: Fwd: polite request about syntax