Optimize ORDER BY ... LIMIT - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Optimize ORDER BY ... LIMIT |
Date | |
Msg-id | 87venpozx8.fsf@enterprisedb.com Whole thread Raw |
Responses |
Re: Optimize ORDER BY ... LIMIT
Re: Optimize ORDER BY ... LIMIT |
List | pgsql-hackers |
I've been looking at doing the following TODO item: Allow ORDER BY ... LIMIT # to select high/low value without sort or index using a sequential scan for highest/lowestvalues Right now, if no index exists, ORDER BY ... LIMIT # requires we sort all values to return the high/low value. InsteadThe idea is to do a sequential scan to find the high/low value, thus avoiding the sort. MIN/MAX already does this,but not for LIMIT > 1. I think this is pretty important to cover at some point because really _not_ doing this just wrong. We're simply not supporting the correct plan for this type of query. Currently we're doing a O(nlogn) plan when the right plan would usually be O(n). (As in, it's actually O(nlogm) where m is usually small and not interesting). The way I see to do this is to still use a Sort node and use a tuplesort but to arrange to get the information of the maximum number of tuples needed to the tuplesort so it can throw out tuples as it sorts. My plan is to have tuplesort reuse the existing heap code it uses for tape merges only keep the memtuples array in a max-heap (instead of the min-heap it uses now -- that means having a tuplesortstate flag indicating which order and having the tuplesort_heap* functions check that flag). When it reaches the limit it can throw away either the new element or the top element on every insert. I considered using a simple insertion-sort instead but was worried that the performance would degrade as the limit clause grows large. I don't think that's a major use case but I don't like the idea of a O(n^2) algorithm lying in wait to ambush someone. Also, because heap sort is slower than qsort (on average anyways) it makes sense to not bother with the heap until the number of tuples grows well beyond the limit or until it would otherwise spill to disk. To actually get the information to the tuplesort the information has to be fed down to the SortState from the LimitState somehow. This I'm not sure how to do. There isn't currently any abstract interface between nodes to pass information like this. The simple solution is that ExecLimit could just peek at its outerPlanState and if it's a SortState it can set some fields so the SortState can know to pass the information to the tuplesort. I've also considered a more abstract interface such as adding an ExecAdvise() function that would pass some sort of structure (an node?) down with the information. This seems like overkill for a single integer but I wonder if there would be other consumers of such an interface. The current eflags could be turned swallowed by this, though I don't see any particular advantage. More realistically a Unique node could also inform a Sort node that it can throw away duplicates as it sorts. A limit could even be passed *through* a unique node as long as the Sort understands how to handle the combination properly. In other areas, a Hash Aggregate can start throw away elements once the number of elements in the hash grows to the limit. Alternatively we could have Limit(Sort()), Unique(Sort()), and Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not introduce the Limit and Unique nodes at all. I would worry about duplicated code in that case though, in particular it seems like there would be cases where we still want to use qsort rather than throw away unneeded tuples. But not throwing away unneeded tuples means reimplementing all of nodeLimit in nodeSort for those cases. And that doesn't help with other cases like Hash Aggregate. Or am I overthinking this and having some state nodes peek inside other state nodes is normal? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: