Re: Optimizing "top queries" ... - Mailing list pgsql-hackers
From | Hans-Juergen Schoenig |
---|---|
Subject | Re: Optimizing "top queries" ... |
Date | |
Msg-id | 9146EC42-39AB-45DE-8380-CDEB963E34D7@cybertec.at Whole thread Raw |
In response to | Re: Optimizing "top queries" ... (Gregory Stark <stark@enterprisedb.com>) |
Responses |
Re: Optimizing "top queries" ...
(Gregory Stark <stark@enterprisedb.com>)
|
List | pgsql-hackers |
i basically thought a node would make more sense as it gives some more flexibility.
making the "replacement strategy" inside the node a bit more fancy this could actually open the door for further optimizations and for other operations.
also, OFFSET would be quite easy as the buffer size needed is perfectly defined by LIMIT + OFFSET.
taking work_mem into consideration we could safely fall back to the old plan if too much data is fetched.
can a node like that be of any further use for other operations as well? i am especially thinking of some other stuff related to analytics.
best regards,
hans
On Dec 6, 2006, at 4:34 PM, Gregory Stark wrote:
"Markus Schiltknecht" <markus@bluegap.ch> writes:Hi,Hans-Juergen Schoenig wrote:in fact, the sort step is not necessary here as we could add a node whichbuffers the highest 10 records and replaces them whenever a higher value isreturned from the underlaying node (in this case seq scan).this query is a quite common scenario when it comes to some analysis relatedissues.saving the sort step is an especially good idea when the table is very large.That sounds very much like what's known as 'partial sort', which has beenproposed by Oleg and Theodor. AFAIK they had a trivial patch sometime aroundversion 7.1, without integration into the planer and optimizer. They weretalking about libpsort, but I can't find that currently. See archives [1] and[2].I actually implemented it again a few months ago during the feature freeze. Ihad a few questions but since it was the middle of the feature freeze I expectpeople had other things on their minds.It is an important form of query since it crops up any time you have a UI(read web page) with a paged result set. Currently postgres has to gather upall the records in the result set and sort them which makes it compare poorlyagainst other databases popular with web site authors...The open question in my patch was how to communicate about the limit down tothe sort node. I had implemented it by having ExecLimit peek into the SortNodeand set a field there.This alternative of making a whole new plan node may have more promise though.It would make it easier to come up with reasonable cost estimates.One thing to keep in mind though is that I also wanted to cover the case ofUnique(Sort(...)) and Limit(Unique(Sort(...))) which can throw away duplicatesearlier. Do we want three different plan nodes? Are there other cases likethese that can benefit?--Gregory Stark---------------------------(end of broadcast)---------------------------TIP 5: don't forget to increase your free space map settings
--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at
pgsql-hackers by date: