Thread: LIMIT Optimization

LIMIT Optimization

From
"alexandre paes :: aldeia digital"
Date:
Hi,

DB2, Sql Server and Oracle have a smart optimization sql-clause (DB2
example):

SELECT * FROM <table> WHERE <cond> ORDER BY <order> OPTMIZATION FOR n ROWS

The [OPTMIZATION FOR] clause turns the query fast by optimize the first "n"
rows.
If the query returns more than "n" rows, the query is slowest if compared
with a normal
query, but it does not have the limitation of PostgreSQL's LIMIT clause.

I think that clause performs the search twice: one for optimize and other if
the # of rows
is great then "n".

It's possible to include this in future releases of PostgreSQL ????


Thanks all!   // ...do not attemp to my harmless english... :)



Alexandre Arruda Paes



Re: LIMIT Optimization

From
Bruce Momjian
Date:
alexandre paes :: aldeia digital wrote:
> Hi,
> 
> DB2, Sql Server and Oracle have a smart optimization sql-clause (DB2
> example):
> 
> SELECT * FROM <table> WHERE <cond> ORDER BY <order> OPTMIZATION FOR n ROWS
> 
> The [OPTMIZATION FOR] clause turns the query fast by optimize the first "n"
> rows.
> If the query returns more than "n" rows, the query is slowest if compared
> with a normal
> query, but it does not have the limitation of PostgreSQL's LIMIT clause.
> 
> I think that clause performs the search twice: one for optimize and other if
> the # of rows
> is great then "n".
> 
> It's possible to include this in future releases of PostgreSQL ????

So it forces our LIMIT optimization, without limiting the number of rows
returned.  That seems to be of questionable value.  The only value I can
see for it is for CURSOR queries but I don't think we can start
returning rows from even a cursor until the entire query is done
executing.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: LIMIT Optimization

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> So it forces our LIMIT optimization, without limiting the number of rows
> returned.  That seems to be of questionable value.  The only value I can
> see for it is for CURSOR queries but I don't think we can start
> returning rows from even a cursor until the entire query is done
> executing.

Not so at all.  However, CURSORs are already optimized on the assumption
that not all the rows will actually get fetched.

Awhile back, someone (Hiroshi, I think) suggested that there should be
a user-accessible SET variable to control the percentage of rows of a
cursor that the optimizer would optimize for fetching.  (Right now,
IIRC, the estimate is fixed at 10%.)  This is a good idea, and quite
trivial to do, but no one got around to it yet.  I think that such a
variable would answer this issue fully, except perhaps for not
duplicating the nonstandard syntax used by the other products.
Do they really all use exactly the same syntax for this?
        regards, tom lane


Re: LIMIT Optimization

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So it forces our LIMIT optimization, without limiting the number of rows
> > returned.  That seems to be of questionable value.  The only value I can
> > see for it is for CURSOR queries but I don't think we can start
> > returning rows from even a cursor until the entire query is done
> > executing.
> 
> Not so at all.  However, CURSORs are already optimized on the assumption
> that not all the rows will actually get fetched.

So, it is only valuable for cursors or can we return results before the
entire query finishes for non-cursors.  The "No so at all" part has me
confused.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: LIMIT Optimization

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> So, it is only valuable for cursors

Right; I can see no rhyme or reason for attaching it to a plain SELECT,
unless the system provides some way of stopping short of fetching all of
a SELECT, which we don't.  A CURSOR would be the analogous thing in PG.
        regards, tom lane


Re: LIMIT Optimization

From
Oleg Bartunov
Date:
On Fri, 25 Jan 2002, Bruce Momjian wrote:

> alexandre paes :: aldeia digital wrote:
> > Hi,
> >
> > DB2, Sql Server and Oracle have a smart optimization sql-clause (DB2
> > example):
> >
> > SELECT * FROM <table> WHERE <cond> ORDER BY <order> OPTMIZATION FOR n ROWS
> >
> > The [OPTMIZATION FOR] clause turns the query fast by optimize the first "n"
> > rows.
> > If the query returns more than "n" rows, the query is slowest if compared
> > with a normal
> > query, but it does not have the limitation of PostgreSQL's LIMIT clause.
> >
> > I think that clause performs the search twice: one for optimize and other if
> > the # of rows
> > is great then "n".
> >
> > It's possible to include this in future releases of PostgreSQL ????
>
> So it forces our LIMIT optimization, without limiting the number of rows
> returned.  That seems to be of questionable value.  The only value I can
> see for it is for CURSOR queries but I don't think we can start
> returning rows from even a cursor until the entire query is done
> executing.


if I'm not mistaken, it's called partial sorting, when you stop
sorting process after getting desired number of rows specified by LIMIT clause.
it's extremely friendly for web applications, because 90% of users
just read the first page of results. We already discussed this feature
sometime during 7.1 dev and even made very crude patch. In our tests we
got performance win of factor 5-6 ( getting first 100 row from 1mln ).
We hope sometime we'll return to this.


>
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: LIMIT Optimization

From
"alexandre paes :: aldeia digital"
Date:
Hi,

Syntax of the "optimization" in other plataforms:

DB2 : [Optimization For n Rows]  ( [Fetch-First n ] = PostgreSQL's LIMIT )
Oracle: [/*fastfirstrows*/]
SQL Server: [FastFirstRows] - version 6.x  // [FAST(n)] - version >= 2000

I simulate the "optmization" in PostgreSQL using:

1. SELECT * FROM <table> WHERE <cond> ORDER BY <order> LIMIT 10
(Explain this query show a Index Scan)

2. Loop in resultset. If the <cond> is break in <= 10 rows continue, else

3. Repeat the query without LIMIT

The times of queries in DB2 and PostgreSQL are same doing it.

------------

Alexandre


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "alexandre paes :: aldeia digital" <alepaes@aldeiadigital.com.br>;
<pgsql-sql@postgresql.org>
Sent: Friday, January 25, 2002 3:58 PM
Subject: Re: [SQL] LIMIT Optimization


> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > So, it is only valuable for cursors
>
> Right; I can see no rhyme or reason for attaching it to a plain SELECT,
> unless the system provides some way of stopping short of fetching all of
> a SELECT, which we don't.  A CURSOR would be the analogous thing in PG.
>
> regards, tom lane



Re: LIMIT Optimization

From
Bruce Momjian
Date:
> if I'm not mistaken, it's called partial sorting, when you stop
> sorting process after getting desired number of rows specified by LIMIT clause.
> it's extremely friendly for web applications, because 90% of users
> just read the first page of results. We already discussed this feature
> sometime during 7.1 dev and even made very crude patch. In our tests we
> got performance win of factor 5-6 ( getting first 100 row from 1mln ).
> We hope sometime we'll return to this.

But we already have cursor's assuming 10% return.  Is allowing that
value to be changed using SET an acceptable solution?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: LIMIT Optimization

From
Oleg Bartunov
Date:
On Fri, 25 Jan 2002, Bruce Momjian wrote:

> > if I'm not mistaken, it's called partial sorting, when you stop
> > sorting process after getting desired number of rows specified by LIMIT clause.
> > it's extremely friendly for web applications, because 90% of users
> > just read the first page of results. We already discussed this feature
> > sometime during 7.1 dev and even made very crude patch. In our tests we
> > got performance win of factor 5-6 ( getting first 100 row from 1mln ).
> > We hope sometime we'll return to this.
>
> But we already have cursor's assuming 10% return.  Is allowing that
> value to be changed using SET an acceptable solution?


Bruce, web applicationa are stateless, so forget about cursors.
I wrote about obvious optimization which is valid for very narrow case,
but which is widely spreaded.

>
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: LIMIT Optimization

From
Bruce Momjian
Date:
Oleg Bartunov wrote:
> On Fri, 25 Jan 2002, Bruce Momjian wrote:
> 
> > > if I'm not mistaken, it's called partial sorting, when you stop
> > > sorting process after getting desired number of rows specified by LIMIT clause.
> > > it's extremely friendly for web applications, because 90% of users
> > > just read the first page of results. We already discussed this feature
> > > sometime during 7.1 dev and even made very crude patch. In our tests we
> > > got performance win of factor 5-6 ( getting first 100 row from 1mln ).
> > > We hope sometime we'll return to this.
> >
> > But we already have cursor's assuming 10% return.  Is allowing that
> > value to be changed using SET an acceptable solution?
> 
> 
> Bruce, web applicationa are stateless, so forget about cursors.
> I wrote about obvious optimization which is valid for very narrow case,
> but which is widely spreaded.

I am confused.  I thought we already did optimization for LIMIT that 
assumed you only wanted a few values.  Is there something we are missing
there?   I thought the disucssion related only to people using cursors
and fetching only some of the data.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: LIMIT Optimization

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am confused.  I thought we already did optimization for LIMIT that 
> assumed you only wanted a few values.  Is there something we are missing
> there?

Yeah, he was proposing an alternative implementation of sorting that
would win in a scenario like
SELECT ... ORDER BY foo LIMIT <something small>

If you have an index on foo then there's no problem, but if you're
forced to do an explicit sort then the system does a complete sort
before you can get any data out.  If the limit is small enough you
can instead do a one-pass "select top N" scan.

Note that this is only workable in the non-cursor case, where you
know the limit for sure.
        regards, tom lane


Re: LIMIT Optimization

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am confused.  I thought we already did optimization for LIMIT that 
> > assumed you only wanted a few values.  Is there something we are missing
> > there?
> 
> Yeah, he was proposing an alternative implementation of sorting that
> would win in a scenario like
> 
>     SELECT ... ORDER BY foo LIMIT <something small>
> 
> If you have an index on foo then there's no problem, but if you're
> forced to do an explicit sort then the system does a complete sort
> before you can get any data out.  If the limit is small enough you
> can instead do a one-pass "select top N" scan.
> 
> Note that this is only workable in the non-cursor case, where you
> know the limit for sure.

Oh, boy, so we would scan through and grab the top X value from the
table without a sort.  Interesting.  Add to TODO:
  Allow ORDER BY ... LIMIT to select top values without sort or index

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: LIMIT Optimization

From
"Ross J. Reedstrom"
Date:
On Sat, Jan 26, 2002 at 11:19:21PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I am confused.  I thought we already did optimization for LIMIT that 
> > > assumed you only wanted a few values.  Is there something we are missing
> > > there?
> > 
> > Yeah, he was proposing an alternative implementation of sorting that
> > would win in a scenario like
> > 
> >     SELECT ... ORDER BY foo LIMIT <something small>
> > 
> > If you have an index on foo then there's no problem, but if you're
> > forced to do an explicit sort then the system does a complete sort
> > before you can get any data out.  If the limit is small enough you
> > can instead do a one-pass "select top N" scan.
> > 
> > Note that this is only workable in the non-cursor case, where you
> > know the limit for sure.
> 
> Oh, boy, so we would scan through and grab the top X value from the
> table without a sort.  Interesting.  Add to TODO:
> 
>    Allow ORDER BY ... LIMIT to select top values without sort or index

Note that it's not as big a win as one might think at first, since you
stil have to scan the entire table to make sure that last tuple isn't
in the LIMIT in the sort order. Big (potential) savingings in sort space
storage, however. And you're O(N) compares, rather than anything larger.

Ross



Re: LIMIT Optimization

From
Oleg Bartunov
Date:
On Sun, 27 Jan 2002, Ross J. Reedstrom wrote:

> On Sat, Jan 26, 2002 at 11:19:21PM -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > I am confused.  I thought we already did optimization for LIMIT that
> > > > assumed you only wanted a few values.  Is there something we are missing
> > > > there?
> > >
> > > Yeah, he was proposing an alternative implementation of sorting that
> > > would win in a scenario like
> > >
> > >     SELECT ... ORDER BY foo LIMIT <something small>
> > >
> > > If you have an index on foo then there's no problem, but if you're
> > > forced to do an explicit sort then the system does a complete sort
> > > before you can get any data out.  If the limit is small enough you
> > > can instead do a one-pass "select top N" scan.
> > >
> > > Note that this is only workable in the non-cursor case, where you
> > > know the limit for sure.
> >
> > Oh, boy, so we would scan through and grab the top X value from the
> > table without a sort.  Interesting.  Add to TODO:
> >
> >    Allow ORDER BY ... LIMIT to select top values without sort or index
>
> Note that it's not as big a win as one might think at first, since you
> stil have to scan the entire table to make sure that last tuple isn't
> in the LIMIT in the sort order. Big (potential) savingings in sort space
> storage, however. And you're O(N) compares, rather than anything larger.

It's still a big win ! think about difference between full sorting of  1mln
rows and partial sorting when you stop sorting after getting desirable
first 100 (or so) rows. This is most common situation in web applications
at least - you always display search results page by page. But statistics
shows that 90% of hits is the first 1-2 pages. If we intend to be more
friendly and compete with MySQL on Web apps. we should consider this
optimization. We had quick and dirty patch for 7.1 but it was just a
sketch and we didn't surprised core developers reject it. Now we have
libpsort - a library which implements partial sorting, and we use it
extensively  in our apps. I think we should add partial sorting
in TODO list for 7.3.

>
> Ross
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: LIMIT Optimization

From
"Ross J. Reedstrom"
Date:
On Mon, Jan 28, 2002 at 04:26:44PM +0300, Oleg Bartunov wrote:
> On Sun, 27 Jan 2002, Ross J. Reedstrom wrote:
> 
> It's still a big win ! think about difference between full sorting of  1mln
> rows and partial sorting when you stop sorting after getting desirable
> first 100 (or so) rows. This is most common situation in web applications

Right - my reason for commenting was the loose language: _you_ know and I
know and Bruce knows and Tom knows what 'stop sorting after getting the first
100 rows' means, but for the sake of the archives, and TODO list, a clearer
wording might be in order.

> at least - you always display search results page by page. But statistics
> shows that 90% of hits is the first 1-2 pages. If we intend to be more
> friendly and compete with MySQL on Web apps. we should consider this
> optimization. We had quick and dirty patch for 7.1 but it was just a
> sketch and we didn't surprised core developers reject it. Now we have
> libpsort - a library which implements partial sorting, and we use it
> extensively  in our apps. I think we should add partial sorting
> in TODO list for 7.3.

Agreed it's a good idea - maybe post your 'quick & dirty' to patches
'for discussion only' as an example for people looking at the TODO
for things to try. We've had a few of those (people looking, that is)
in the last couple weeks.

Ross


Re: LIMIT Optimization- please release anything you've got ... :-)

From
"Steve Brett"
Date:
i have a query that goes through a bespoke middle tier we have written in
delphi that takes 1.5 seconds to return 25 results (limit 25).

when i order the query it takes 9.5 seconds.

i'm presuming that the entire resultset (some 5000 records) is ordered
before the limit is applied.

is this correct and if so is there any way around it ?

TIA

Steve

"Oleg Bartunov" <oleg@sai.msu.su> wrote in message
news:Pine.GSO.4.33.0201281615350.7250-100000@ra.sai.msu.su...
> On Sun, 27 Jan 2002, Ross J. Reedstrom wrote:
>
> > On Sat, Jan 26, 2002 at 11:19:21PM -0500, Bruce Momjian wrote:
> > > Tom Lane wrote:
> > > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > > I am confused.  I thought we already did optimization for LIMIT
that
> > > > > assumed you only wanted a few values.  Is there something we are
missing
> > > > > there?
> > > >
> > > > Yeah, he was proposing an alternative implementation of sorting that
> > > > would win in a scenario like
> > > >
> > > > SELECT ... ORDER BY foo LIMIT <something small>
> > > >
> > > > If you have an index on foo then there's no problem, but if you're
> > > > forced to do an explicit sort then the system does a complete sort
> > > > before you can get any data out.  If the limit is small enough you
> > > > can instead do a one-pass "select top N" scan.
> > > >
> > > > Note that this is only workable in the non-cursor case, where you
> > > > know the limit for sure.
> > >
> > > Oh, boy, so we would scan through and grab the top X value from the
> > > table without a sort.  Interesting.  Add to TODO:
> > >
> > >    Allow ORDER BY ... LIMIT to select top values without sort or index
> >
> > Note that it's not as big a win as one might think at first, since you
> > stil have to scan the entire table to make sure that last tuple isn't
> > in the LIMIT in the sort order. Big (potential) savingings in sort space
> > storage, however. And you're O(N) compares, rather than anything larger.
>
> It's still a big win ! think about difference between full sorting of
1mln
> rows and partial sorting when you stop sorting after getting desirable
> first 100 (or so) rows. This is most common situation in web applications
> at least - you always display search results page by page. But statistics
> shows that 90% of hits is the first 1-2 pages. If we intend to be more
> friendly and compete with MySQL on Web apps. we should consider this
> optimization. We had quick and dirty patch for 7.1 but it was just a
> sketch and we didn't surprised core developers reject it. Now we have
> libpsort - a library which implements partial sorting, and we use it
> extensively  in our apps. I think we should add partial sorting
> in TODO list for 7.3.
>
> >
> > Ross
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly