Thread: LIMIT Optimization
Hi, DB2, Sql Server and Oracle have a smart optimization sql-clause (DB2 example): SELECT * FROM <table> WHERE <cond> ORDER BY <order> OPTMIZATION FOR n ROWS The [OPTMIZATION FOR] clause turns the query fast by optimize the first "n" rows. If the query returns more than "n" rows, the query is slowest if compared with a normal query, but it does not have the limitation of PostgreSQL's LIMIT clause. I think that clause performs the search twice: one for optimize and other if the # of rows is great then "n". It's possible to include this in future releases of PostgreSQL ???? Thanks all! // ...do not attemp to my harmless english... :) Alexandre Arruda Paes
alexandre paes :: aldeia digital wrote: > Hi, > > DB2, Sql Server and Oracle have a smart optimization sql-clause (DB2 > example): > > SELECT * FROM <table> WHERE <cond> ORDER BY <order> OPTMIZATION FOR n ROWS > > The [OPTMIZATION FOR] clause turns the query fast by optimize the first "n" > rows. > If the query returns more than "n" rows, the query is slowest if compared > with a normal > query, but it does not have the limitation of PostgreSQL's LIMIT clause. > > I think that clause performs the search twice: one for optimize and other if > the # of rows > is great then "n". > > It's possible to include this in future releases of PostgreSQL ???? So it forces our LIMIT optimization, without limiting the number of rows returned. That seems to be of questionable value. The only value I can see for it is for CURSOR queries but I don't think we can start returning rows from even a cursor until the entire query is done executing. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > So it forces our LIMIT optimization, without limiting the number of rows > returned. That seems to be of questionable value. The only value I can > see for it is for CURSOR queries but I don't think we can start > returning rows from even a cursor until the entire query is done > executing. Not so at all. However, CURSORs are already optimized on the assumption that not all the rows will actually get fetched. Awhile back, someone (Hiroshi, I think) suggested that there should be a user-accessible SET variable to control the percentage of rows of a cursor that the optimizer would optimize for fetching. (Right now, IIRC, the estimate is fixed at 10%.) This is a good idea, and quite trivial to do, but no one got around to it yet. I think that such a variable would answer this issue fully, except perhaps for not duplicating the nonstandard syntax used by the other products. Do they really all use exactly the same syntax for this? regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > So it forces our LIMIT optimization, without limiting the number of rows > > returned. That seems to be of questionable value. The only value I can > > see for it is for CURSOR queries but I don't think we can start > > returning rows from even a cursor until the entire query is done > > executing. > > Not so at all. However, CURSORs are already optimized on the assumption > that not all the rows will actually get fetched. So, it is only valuable for cursors or can we return results before the entire query finishes for non-cursors. The "No so at all" part has me confused. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > So, it is only valuable for cursors Right; I can see no rhyme or reason for attaching it to a plain SELECT, unless the system provides some way of stopping short of fetching all of a SELECT, which we don't. A CURSOR would be the analogous thing in PG. regards, tom lane
On Fri, 25 Jan 2002, Bruce Momjian wrote: > alexandre paes :: aldeia digital wrote: > > Hi, > > > > DB2, Sql Server and Oracle have a smart optimization sql-clause (DB2 > > example): > > > > SELECT * FROM <table> WHERE <cond> ORDER BY <order> OPTMIZATION FOR n ROWS > > > > The [OPTMIZATION FOR] clause turns the query fast by optimize the first "n" > > rows. > > If the query returns more than "n" rows, the query is slowest if compared > > with a normal > > query, but it does not have the limitation of PostgreSQL's LIMIT clause. > > > > I think that clause performs the search twice: one for optimize and other if > > the # of rows > > is great then "n". > > > > It's possible to include this in future releases of PostgreSQL ???? > > So it forces our LIMIT optimization, without limiting the number of rows > returned. That seems to be of questionable value. The only value I can > see for it is for CURSOR queries but I don't think we can start > returning rows from even a cursor until the entire query is done > executing. if I'm not mistaken, it's called partial sorting, when you stop sorting process after getting desired number of rows specified by LIMIT clause. it's extremely friendly for web applications, because 90% of users just read the first page of results. We already discussed this feature sometime during 7.1 dev and even made very crude patch. In our tests we got performance win of factor 5-6 ( getting first 100 row from 1mln ). We hope sometime we'll return to this. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Hi, Syntax of the "optimization" in other plataforms: DB2 : [Optimization For n Rows] ( [Fetch-First n ] = PostgreSQL's LIMIT ) Oracle: [/*fastfirstrows*/] SQL Server: [FastFirstRows] - version 6.x // [FAST(n)] - version >= 2000 I simulate the "optmization" in PostgreSQL using: 1. SELECT * FROM <table> WHERE <cond> ORDER BY <order> LIMIT 10 (Explain this query show a Index Scan) 2. Loop in resultset. If the <cond> is break in <= 10 rows continue, else 3. Repeat the query without LIMIT The times of queries in DB2 and PostgreSQL are same doing it. ------------ Alexandre ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Bruce Momjian" <pgman@candle.pha.pa.us> Cc: "alexandre paes :: aldeia digital" <alepaes@aldeiadigital.com.br>; <pgsql-sql@postgresql.org> Sent: Friday, January 25, 2002 3:58 PM Subject: Re: [SQL] LIMIT Optimization > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > So, it is only valuable for cursors > > Right; I can see no rhyme or reason for attaching it to a plain SELECT, > unless the system provides some way of stopping short of fetching all of > a SELECT, which we don't. A CURSOR would be the analogous thing in PG. > > regards, tom lane
> if I'm not mistaken, it's called partial sorting, when you stop > sorting process after getting desired number of rows specified by LIMIT clause. > it's extremely friendly for web applications, because 90% of users > just read the first page of results. We already discussed this feature > sometime during 7.1 dev and even made very crude patch. In our tests we > got performance win of factor 5-6 ( getting first 100 row from 1mln ). > We hope sometime we'll return to this. But we already have cursor's assuming 10% return. Is allowing that value to be changed using SET an acceptable solution? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Fri, 25 Jan 2002, Bruce Momjian wrote: > > if I'm not mistaken, it's called partial sorting, when you stop > > sorting process after getting desired number of rows specified by LIMIT clause. > > it's extremely friendly for web applications, because 90% of users > > just read the first page of results. We already discussed this feature > > sometime during 7.1 dev and even made very crude patch. In our tests we > > got performance win of factor 5-6 ( getting first 100 row from 1mln ). > > We hope sometime we'll return to this. > > But we already have cursor's assuming 10% return. Is allowing that > value to be changed using SET an acceptable solution? Bruce, web applicationa are stateless, so forget about cursors. I wrote about obvious optimization which is valid for very narrow case, but which is widely spreaded. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > On Fri, 25 Jan 2002, Bruce Momjian wrote: > > > > if I'm not mistaken, it's called partial sorting, when you stop > > > sorting process after getting desired number of rows specified by LIMIT clause. > > > it's extremely friendly for web applications, because 90% of users > > > just read the first page of results. We already discussed this feature > > > sometime during 7.1 dev and even made very crude patch. In our tests we > > > got performance win of factor 5-6 ( getting first 100 row from 1mln ). > > > We hope sometime we'll return to this. > > > > But we already have cursor's assuming 10% return. Is allowing that > > value to be changed using SET an acceptable solution? > > > Bruce, web applicationa are stateless, so forget about cursors. > I wrote about obvious optimization which is valid for very narrow case, > but which is widely spreaded. I am confused. I thought we already did optimization for LIMIT that assumed you only wanted a few values. Is there something we are missing there? I thought the disucssion related only to people using cursors and fetching only some of the data. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I am confused. I thought we already did optimization for LIMIT that > assumed you only wanted a few values. Is there something we are missing > there? Yeah, he was proposing an alternative implementation of sorting that would win in a scenario like SELECT ... ORDER BY foo LIMIT <something small> If you have an index on foo then there's no problem, but if you're forced to do an explicit sort then the system does a complete sort before you can get any data out. If the limit is small enough you can instead do a one-pass "select top N" scan. Note that this is only workable in the non-cursor case, where you know the limit for sure. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I am confused. I thought we already did optimization for LIMIT that > > assumed you only wanted a few values. Is there something we are missing > > there? > > Yeah, he was proposing an alternative implementation of sorting that > would win in a scenario like > > SELECT ... ORDER BY foo LIMIT <something small> > > If you have an index on foo then there's no problem, but if you're > forced to do an explicit sort then the system does a complete sort > before you can get any data out. If the limit is small enough you > can instead do a one-pass "select top N" scan. > > Note that this is only workable in the non-cursor case, where you > know the limit for sure. Oh, boy, so we would scan through and grab the top X value from the table without a sort. Interesting. Add to TODO: Allow ORDER BY ... LIMIT to select top values without sort or index -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sat, Jan 26, 2002 at 11:19:21PM -0500, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > I am confused. I thought we already did optimization for LIMIT that > > > assumed you only wanted a few values. Is there something we are missing > > > there? > > > > Yeah, he was proposing an alternative implementation of sorting that > > would win in a scenario like > > > > SELECT ... ORDER BY foo LIMIT <something small> > > > > If you have an index on foo then there's no problem, but if you're > > forced to do an explicit sort then the system does a complete sort > > before you can get any data out. If the limit is small enough you > > can instead do a one-pass "select top N" scan. > > > > Note that this is only workable in the non-cursor case, where you > > know the limit for sure. > > Oh, boy, so we would scan through and grab the top X value from the > table without a sort. Interesting. Add to TODO: > > Allow ORDER BY ... LIMIT to select top values without sort or index Note that it's not as big a win as one might think at first, since you stil have to scan the entire table to make sure that last tuple isn't in the LIMIT in the sort order. Big (potential) savingings in sort space storage, however. And you're O(N) compares, rather than anything larger. Ross
On Sun, 27 Jan 2002, Ross J. Reedstrom wrote: > On Sat, Jan 26, 2002 at 11:19:21PM -0500, Bruce Momjian wrote: > > Tom Lane wrote: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > I am confused. I thought we already did optimization for LIMIT that > > > > assumed you only wanted a few values. Is there something we are missing > > > > there? > > > > > > Yeah, he was proposing an alternative implementation of sorting that > > > would win in a scenario like > > > > > > SELECT ... ORDER BY foo LIMIT <something small> > > > > > > If you have an index on foo then there's no problem, but if you're > > > forced to do an explicit sort then the system does a complete sort > > > before you can get any data out. If the limit is small enough you > > > can instead do a one-pass "select top N" scan. > > > > > > Note that this is only workable in the non-cursor case, where you > > > know the limit for sure. > > > > Oh, boy, so we would scan through and grab the top X value from the > > table without a sort. Interesting. Add to TODO: > > > > Allow ORDER BY ... LIMIT to select top values without sort or index > > Note that it's not as big a win as one might think at first, since you > stil have to scan the entire table to make sure that last tuple isn't > in the LIMIT in the sort order. Big (potential) savingings in sort space > storage, however. And you're O(N) compares, rather than anything larger. It's still a big win ! think about difference between full sorting of 1mln rows and partial sorting when you stop sorting after getting desirable first 100 (or so) rows. This is most common situation in web applications at least - you always display search results page by page. But statistics shows that 90% of hits is the first 1-2 pages. If we intend to be more friendly and compete with MySQL on Web apps. we should consider this optimization. We had quick and dirty patch for 7.1 but it was just a sketch and we didn't surprised core developers reject it. Now we have libpsort - a library which implements partial sorting, and we use it extensively in our apps. I think we should add partial sorting in TODO list for 7.3. > > Ross > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Mon, Jan 28, 2002 at 04:26:44PM +0300, Oleg Bartunov wrote: > On Sun, 27 Jan 2002, Ross J. Reedstrom wrote: > > It's still a big win ! think about difference between full sorting of 1mln > rows and partial sorting when you stop sorting after getting desirable > first 100 (or so) rows. This is most common situation in web applications Right - my reason for commenting was the loose language: _you_ know and I know and Bruce knows and Tom knows what 'stop sorting after getting the first 100 rows' means, but for the sake of the archives, and TODO list, a clearer wording might be in order. > at least - you always display search results page by page. But statistics > shows that 90% of hits is the first 1-2 pages. If we intend to be more > friendly and compete with MySQL on Web apps. we should consider this > optimization. We had quick and dirty patch for 7.1 but it was just a > sketch and we didn't surprised core developers reject it. Now we have > libpsort - a library which implements partial sorting, and we use it > extensively in our apps. I think we should add partial sorting > in TODO list for 7.3. Agreed it's a good idea - maybe post your 'quick & dirty' to patches 'for discussion only' as an example for people looking at the TODO for things to try. We've had a few of those (people looking, that is) in the last couple weeks. Ross
i have a query that goes through a bespoke middle tier we have written in delphi that takes 1.5 seconds to return 25 results (limit 25). when i order the query it takes 9.5 seconds. i'm presuming that the entire resultset (some 5000 records) is ordered before the limit is applied. is this correct and if so is there any way around it ? TIA Steve "Oleg Bartunov" <oleg@sai.msu.su> wrote in message news:Pine.GSO.4.33.0201281615350.7250-100000@ra.sai.msu.su... > On Sun, 27 Jan 2002, Ross J. Reedstrom wrote: > > > On Sat, Jan 26, 2002 at 11:19:21PM -0500, Bruce Momjian wrote: > > > Tom Lane wrote: > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > > I am confused. I thought we already did optimization for LIMIT that > > > > > assumed you only wanted a few values. Is there something we are missing > > > > > there? > > > > > > > > Yeah, he was proposing an alternative implementation of sorting that > > > > would win in a scenario like > > > > > > > > SELECT ... ORDER BY foo LIMIT <something small> > > > > > > > > If you have an index on foo then there's no problem, but if you're > > > > forced to do an explicit sort then the system does a complete sort > > > > before you can get any data out. If the limit is small enough you > > > > can instead do a one-pass "select top N" scan. > > > > > > > > Note that this is only workable in the non-cursor case, where you > > > > know the limit for sure. > > > > > > Oh, boy, so we would scan through and grab the top X value from the > > > table without a sort. Interesting. Add to TODO: > > > > > > Allow ORDER BY ... LIMIT to select top values without sort or index > > > > Note that it's not as big a win as one might think at first, since you > > stil have to scan the entire table to make sure that last tuple isn't > > in the LIMIT in the sort order. Big (potential) savingings in sort space > > storage, however. And you're O(N) compares, rather than anything larger. > > It's still a big win ! think about difference between full sorting of 1mln > rows and partial sorting when you stop sorting after getting desirable > first 100 (or so) rows. This is most common situation in web applications > at least - you always display search results page by page. But statistics > shows that 90% of hits is the first 1-2 pages. If we intend to be more > friendly and compete with MySQL on Web apps. we should consider this > optimization. We had quick and dirty patch for 7.1 but it was just a > sketch and we didn't surprised core developers reject it. Now we have > libpsort - a library which implements partial sorting, and we use it > extensively in our apps. I think we should add partial sorting > in TODO list for 7.3. > > > > > Ross > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/users-lounge/docs/faq.html > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly